我有一个数据框,如下所示:
Slot Time Last Next
1 9:30 9:37
2 9:35 9:32 9:40
3 9:40 9:37 9:52
4 9:45 9:41 9:47
5 9:50 9:47 10:00
我在这里要做的是创建两个新列'min'和'max',这样'min'会输出时间 此处所需的输出应为: 我尝试了类似的方法 但是有一个空列表。任何帮助是极大的赞赏。谢谢!df['min'] = [NaN,1,2,3,4]
and
df['max'] = [2,2,5,4,5]
for index, row in df.iterrows():
row['min'] = df[df['Time'] < row['Last']]['Slot']
答案 0 :(得分:2)
首先,我将日期列转换为日期时间格式,否则当您比较字符串时,它仅考虑第一个数字:
df = df_.copy()
df.loc[:, 'Time':'Next'] = df.loc[:, 'Time':'Next']
.apply(pd.to_datetime, errors='coerce')
对于min
列,您可以执行以下操作:
min_vals = [(df['Time'] < x)[::-1].idxmax()
if any(df['Time'] < x) else np.nan for x in df['Last']]
df_['min'] = df.loc[min_vals,'Slot'].values
对于max
:
max_vals = [(df['Time'] < x)[::-1].idxmax()
if any(df['Time'] < x) else np.nan for x in df['Next']]
df_.loc[:,'max'] = df.loc[max_vals,'Slot'].values
哪个会给你:
print(df_)
Slot Time Last Next min max
0 1 9:30 - 9:37 NaN 2
1 2 9:35 9:32 9:40 1.0 2
2 3 9:40 9:37 9:52 2.0 5
3 4 9:45 9:41 9:47 3.0 4
4 5 9:50 9:47 10:00 4.0 5
答案 1 :(得分:1)
我尝试过
public void Read()
{
Console.WriteLine("Reading...");
gsmPort.WriteLine("AT+CMGF=1"); // SET MODE TO TEXT
Thread.Sleep(1000); //1sec to write
gsmPort.WriteLine("AT+CPMS =\"SM\""); // SET STORAGE TO SIM
Thread.Sleep(1000); //1sec to write
gsmPort.WriteLine("AT+CMGL=\"ALL\"");
gsmPort.WriteLine("AT+CMGL=\"REC UNREAD\"");
//gsmPort.WriteLine("AT+CMGR ")
Thread.Sleep(1000); //1sec to write
string response = gsmPort.ReadExisting();
if (response.EndsWith("\r\nOK\r\n"))
{
Console.WriteLine(response);
}
else
{
Console.WriteLine(response);
}
}
O / P:
x=[]
y=[]
for index, row in df.iterrows():
t=df[df['Time'] < row['Last']]['Slot'].values
s=df[df['Time'] < row['Next']]['Slot'].values
if len(t)==0:
x.append(np.nan)
else:
x.append(t[-1])
if len(s)==0:
y.append(np.nan)
else:
y.append(s[-1])
df['min']=x
df['max']=y
print df
注意:这不是解决问题的大熊猫方法,当您尝试循环时,我建议给出一个解决in循环的想法。性能落后。
答案 2 :(得分:1)
这是numba
有助于提供有效解决方案的场合。这是一个明确的for
循环,但为提高性能而进行了JIT编译。
from numba import njit
# convert to timedelta
time_cols = ['Time','Last','Next']
df[time_cols] = (df[time_cols] + ':00').apply(pd.to_timedelta)
# define loopy algorithm
@njit
def get_idx(times, comps, slots):
n = len(times)
res = np.empty(n)
for i in range(n):
mycomp = comps[i]
if mycomp != mycomp:
res[i] = np.nan
else:
for j in range(n, 0, -1):
if times[j-1] < mycomp:
res[i] = slots[j-1]
break
else:
res[i] = np.nan
return res
# extract timedeltas as seconds
arr = df[time_cols].apply(lambda x: x.dt.total_seconds()).values
# apply logic
df['min'] = get_idx(arr[:, 0], arr[:, 1], df['Slot'].values)
df['max'] = get_idx(arr[:, 0], arr[:, 2], df['Slot'].values)
结果
print(df)
Slot Time Last Next min max
0 1 09:30:00 NaT 09:37:00 NaN 2.0
1 2 09:35:00 09:32:00 09:40:00 1.0 2.0
2 3 09:40:00 09:37:00 09:52:00 2.0 5.0
3 4 09:45:00 09:41:00 09:47:00 3.0 4.0
4 5 09:50:00 09:47:00 10:00:00 4.0 5.0
性能基准化
您可以看到较大数据帧的性能大幅提升:
def nix(df):
min_vals = [(df['Time'] < x)[::-1].idxmax()
if any(df['Time'] < x) else np.nan for x in df['Last']]
df['min'] = df.loc[min_vals,'Slot'].values
max_vals = [(df['Time'] < x)[::-1].idxmax()
if any(df['Time'] < x) else np.nan for x in df['Next']]
df.loc[:,'max'] = df.loc[max_vals,'Slot'].values
return df
def jpp(df):
arr = df[time_cols].apply(lambda x: x.dt.total_seconds()).values
df['min'] = get_idx(arr[:, 0], arr[:, 1], df['Slot'].values)
df['max'] = get_idx(arr[:, 0], arr[:, 2], df['Slot'].values)
return df
df = pd.concat([df]*1000, ignore_index=True)
%timeit nix(df.copy()) # 8.85 s per loop
%timeit jpp(df.copy()) # 5.02 ms per loop
相关:Efficiently return the index of the first value satisfying condition in array。