比较给定列2比2的值的最佳方法

时间:2018-07-23 21:48:28

标签: python pandas

我正在尝试将数据框的给定列的值两两比较(以前的VS当前),以创建新列。

我的输入df为:

            timestamp  charging
0 2017-10-15 18:36:46         1
1 2017-10-15 18:41:54         1
2 2017-10-15 18:46:54         1
3 2017-10-15 18:50:35         1
4 2017-10-15 18:54:14        -1
5 2017-10-15 18:57:54        -1
6 2017-10-15 19:02:47        -1
7 2017-10-15 19:11:41         1
8 2017-10-15 19:21:25         1
9 2017-10-15 19:31:04        -1

仅当计费值从正变为负或从负变为正时,我才想创建具有相同时间戳值的新列。 输出应为:

            timestamp  charging period start/end time
0 2017-10-15 18:36:46         1                   NaT
1 2017-10-15 18:41:54         1                   NaT
2 2017-10-15 18:46:54         1                   NaT
3 2017-10-15 18:50:35         1   2017-10-15 18:50:35
4 2017-10-15 18:54:14        -1   2017-10-15 18:54:14
5 2017-10-15 18:57:54        -1                   NaT
6 2017-10-15 19:02:47        -1   2017-10-15 19:02:47
7 2017-10-15 19:11:41         1   2017-10-15 19:11:41
8 2017-10-15 19:21:25         1   2017-10-15 19:21:25
9 2017-10-15 19:31:04        -1   2017-10-15 19:31:04

我用下面的代码做的不好(但是可以):

df['period start/end time'] = pd.NaT

for ind in df.index:
    if ind > 0:
       if df.at[ind, 'charging'] > 0 and df.at[ind-1, 'charging'] < 0:
          df.at[ind-1, 'period start/end time'] = df.at[ind-1, 'timestamp']
          df.at[ind, 'period start/end time'] = df.at[ind, 'timestamp']

       if df.at[ind, 'charging'] < 0 and df.at[ind-1, 'charging'] > 0:
          df.at[ind-1, 'period start/end time'] = df.at[ind-1, 'timestamp']
          df.at[ind, 'period start/end time'] = df.at[ind, 'timestamp']

这要花很多时间!,是否有做得更快更好的方法?

2 个答案:

答案 0 :(得分:3)

IIUC,

mask = (df.charging != df.charging.shift().bfill())
df.loc[mask | mask.shift(-1).fillna(False), 'new']  = df.timestamp

    timestamp             charging  new
0   2017-10-15 18:36:46   1         NaT
1   2017-10-15 18:41:54   1         NaT
2   2017-10-15 18:46:54   1         NaT
3   2017-10-15 18:50:35   1         2017-10-15 18:50:35
4   2017-10-15 18:54:14  -1         2017-10-15 18:54:14
5   2017-10-15 18:57:54  -1         NaT
6   2017-10-15 19:02:47  -1         2017-10-15 19:02:47
7   2017-10-15 19:11:41   1         2017-10-15 19:11:41
8   2017-10-15 19:21:25   1         2017-10-15 19:21:25
9   2017-10-15 19:31:04  -1         2017-10-15 19:31:04

答案 1 :(得分:1)

创建遮罩:

condition = df.charging.diff().bfill().ne(0) | df.charging.diff().shift(-1).ne(0)

使用 np.where

df['new'] = np.where(condition, df.timestamp, pd.NaT)   

            timestamp  charging                 new
0  2017-10-1518:36:46         1                 NaT
1  2017-10-1518:41:54         1                 NaT
2  2017-10-1518:46:54         1                 NaT
3  2017-10-1518:50:35         1  2017-10-1518:50:35
4  2017-10-1518:54:14        -1  2017-10-1518:54:14
5  2017-10-1518:57:54        -1                 NaT
6  2017-10-1519:02:47        -1  2017-10-1519:02:47
7  2017-10-1519:11:41         1  2017-10-1519:11:41
8  2017-10-1519:21:25         1  2017-10-1519:21:25
9  2017-10-1519:31:04        -1  2017-10-1519:31:04