熊猫:根据特定的列值重置增量

时间:2019-02-08 15:21:26

标签: python pandas

我想创建一个列,该列对于差异中不是NaT的每一行增加1。如果值为NaT,我要重新设置增量

下面是一个示例数据框:

              x        y      min      z        o     diffs
0             0        0       0       1        1      NaT
1             0        0       0       2        1 00:00:01
2             0        0       0       6        1 00:00:04
3             0        0       0      11        1 00:00:05
4             0        0       0      14        0      NaT
5             0        0       2      18        0      NaT
6             0        0       2      41        1      NaT
7             0        0       2      42        0      NaT
8             0        0       8      13        1 00:00:54
9             0        0       8      16        1 00:00:03
10            0        0       8      17        1 00:00:01
11            0        0       8      20        0      NaT
12            0        0       8      32        1      NaT

这是我的预期输出:

              x        y      min      z        o     diffs   increment
0             0        0       0       1        1      NaT      0
1             0        0       0       2        1 00:00:01      1
2             0        0       0       6        1 00:00:04      2
3             0        0       0      11        1 00:00:05      3
4             0        0       0      14        0      NaT      0
5             0        0       2      18        0      NaT      0
6             0        0       2      41        1      NaT      0
7             0        0       2      42        0      NaT      0
8             0        0       8      13        1 00:00:54      1
9             0        0       8      16        1 00:00:03      2
10            0        0       8      17        1 00:00:01      3
11            0        0       8      20        0      NaT      0
12            0        0       8      32        1      NaT      0

1 个答案:

答案 0 :(得分:2)

使用numpy.where并设置不丢失值,以cumcount对连续的不丢失组进行计数:

m = df['diffs'].notnull()
df['increment'] = np.where(m, df.groupby(m.ne(m.shift()).cumsum()).cumcount()+1, 0)
print (df)
    x  y  min   z  o    diffs  increment
0   0  0    0   1  1      NaT          0
1   0  0    0   2  1 00:00:01          1
2   0  0    0   6  1 00:00:04          2
3   0  0    0  11  1 00:00:05          3
4   0  0    0  14  0      NaT          0
5   0  0    2  18  0      NaT          0
6   0  0    2  41  1      NaT          0
7   0  0    2  42  0      NaT          0
8   0  0    8  13  1 00:00:54          1
9   0  0    8  16  1 00:00:03          2
10  0  0    8  17  1 00:00:01          3
11  0  0    8  20  0      NaT          0
12  0  0    8  32  1      NaT          0

如果性能很重要,请选择替代解决方案:

b = m.cumsum()
df['increment'] = b-b.mask(m).ffill().fillna(0).astype(int)