我有一个问题,就是下面的代码非常慢。我使用Python和Pandas的时间不长了,所以我不知道从哪里开始。
我想确定每行的前任和后继。
当前,我遍历每行并输出满足我条件的行。从这些系列中,我确定一次最大值和最小值。
我有以下记录:
index Case Button Start rowNow
0 x a 2017-12-06 10:17:43.227 0
1 x b 2017-12-06 10:17:44.876 1
2 x c 2017-12-06 10:17:45.719 2
3 y a 2017-12-06 15:28:57.500 3
4 y e 2017-12-06 15:29:19.079 4
我想得到它:
index Case Button Start rowNow prevNum nextNum
0 x a 2017-12-06 10:17:43.227 0 NaN 1
1 x b 2017-12-06 10:17:44.876 1 0 2
2 x c 2017-12-06 10:17:45.719 2 1 NaN
3 y a 2017-12-06 15:28:57.500 3 NaN 4
4 y e 2017-12-06 15:29:19.079 4 3 NaN
有人可以给我一些有关如何优化此代码速度的提示吗?可以在这里完全使用矢量化吗?
for index, row in df.iterrows():
x = df[(df['Case'] == row['Case']) & (df['rowNow'] < row['rowNow']) & (row['Start'] >= df['Start'])]
df.loc[index,'prevNum'] = x['rowNow'].max()
y = df[(df['Case'] == row['Case']) & (df['rowNow'] > row['rowNow']) & (row['Start'] <= df['Start'])]
df.loc[index,'nextNum'] = y['rowNow'].min()
答案 0 :(得分:1)
尝试:
df['Start']=pd.to_datetime(df['Start'])
df['prevNum']=df['rowNow'].shift()
df['nextNum']=df['rowNow'].shift(-1)
df.loc[df['Start'].dt.hour!=df['Start'].shift().dt.hour,'prevNum']=pd.np.nan
df.loc[df['Start'].dt.hour!=df['Start'].shift(-1).dt.hour,'nextNum']=pd.np.nan
print(df)
如果列start
不是日期时间格式,请执行以下操作:
df['Start']=pd.to_datetime(df['Start'])
一切先于
输出:
index Case Button Start rowNow prevNum nextNum
0 x a 2017-12-06 2018-09-11 10:17:43.227 0 NaN 1.0
1 x b 2017-12-06 2018-09-11 10:17:44.876 1 0.0 2.0
2 x c 2017-12-06 2018-09-11 10:17:45.719 2 1.0 NaN
3 y a 2017-12-06 2018-09-11 15:28:57.500 3 NaN 4.0
4 y e 2017-12-06 2018-09-11 15:29:19.079 4 3.0 NaN
答案 1 :(得分:1)
尝试一下:
df['prevNum'] = df.groupby('Case').apply(lambda x:x[['rowNow']].shift(1))
df['nextNum'] = df.groupby('Case').apply(lambda x:x[['rowNow']].shift(-1))