方法比带条件的迭代更快(每行的前任和后继)

时间:2018-09-11 08:54:27

标签: python pandas performance iteration vectorization

我有一个问题,就是下面的代码非常慢。我使用Python和Pandas的时间不长了,所以我不知道从哪里开始。

我想确定每行的前任和后继。

当前,我遍历每行并输出满足我条件的行。从这些系列中,我确定一次最大值和最小值。

我有以下记录:

index   Case    Button      Start                       rowNow
0       x       a           2017-12-06 10:17:43.227     0
1       x       b           2017-12-06 10:17:44.876     1
2       x       c           2017-12-06 10:17:45.719     2
3       y       a           2017-12-06 15:28:57.500     3
4       y       e           2017-12-06 15:29:19.079     4

我想得到它:

index   Case    Button      Start                       rowNow  prevNum nextNum
0       x       a           2017-12-06 10:17:43.227     0       NaN     1
1       x       b           2017-12-06 10:17:44.876     1       0       2
2       x       c           2017-12-06 10:17:45.719     2       1       NaN
3       y       a           2017-12-06 15:28:57.500     3       NaN     4
4       y       e           2017-12-06 15:29:19.079     4       3       NaN

有人可以给我一些有关如何优化此代码速度的提示吗?可以在这里完全使用矢量化吗?

for index, row in df.iterrows():

    x = df[(df['Case'] == row['Case']) & (df['rowNow'] < row['rowNow']) & (row['Start'] >= df['Start'])]
    df.loc[index,'prevNum'] = x['rowNow'].max()
    y = df[(df['Case'] == row['Case']) & (df['rowNow'] > row['rowNow']) & (row['Start'] <= df['Start'])]    
    df.loc[index,'nextNum'] = y['rowNow'].min()

2 个答案:

答案 0 :(得分:1)

尝试:

df['Start']=pd.to_datetime(df['Start'])
df['prevNum']=df['rowNow'].shift()
df['nextNum']=df['rowNow'].shift(-1)
df.loc[df['Start'].dt.hour!=df['Start'].shift().dt.hour,'prevNum']=pd.np.nan
df.loc[df['Start'].dt.hour!=df['Start'].shift(-1).dt.hour,'nextNum']=pd.np.nan
print(df)

如果列start不是日期时间格式,请执行以下操作:

df['Start']=pd.to_datetime(df['Start'])

一切先于

输出:

  index Case      Button                   Start  rowNow  prevNum  nextNum
0     x    a  2017-12-06 2018-09-11 10:17:43.227       0      NaN      1.0
1     x    b  2017-12-06 2018-09-11 10:17:44.876       1      0.0      2.0
2     x    c  2017-12-06 2018-09-11 10:17:45.719       2      1.0      NaN
3     y    a  2017-12-06 2018-09-11 15:28:57.500       3      NaN      4.0
4     y    e  2017-12-06 2018-09-11 15:29:19.079       4      3.0      NaN

答案 1 :(得分:1)

尝试一下:

df['prevNum'] = df.groupby('Case').apply(lambda x:x[['rowNow']].shift(1))
df['nextNum'] = df.groupby('Case').apply(lambda x:x[['rowNow']].shift(-1))