Question

Pandas DataFrame, computing the Time Difference between one row and other row which satisfies a condition

与此问题类似，鉴于

我想在主事件之间找到“事件发生的总和”，“事件发生的总和A”和事件B。换句话说，主事件发生，事件A的累积和重置为零的情况。

示例输出如下

@Jon Strutz的示例输入代码

 import pandas as pd
df = pd.DataFrame({'year': [2019] * 10,
                       'month': [8] * 10,
                       'day': [16] * 10,
                       'hour': [12, 12, 12, 12, 13, 13, 13, 13, 13, 13],
                       'minute': [50, 52, 53, 57, 0, 3,4,5,13,21]})

df = pd.DataFrame(pd.to_datetime(df), columns=['Time_Stamp'])
df['Event_Master'] = [0, 0, 1, 0, 0 ,0, 0, 0, 1,0]
df['Event_B']      = [0, 0, 0, 1, 0 ,0, 1, 0, 0,1]

预期的输出可能像

df['Event_Master_Out'] = [0, 0, 1, 1, 1 ,1, 1, 1, 2,2]
df['Event_B_Out'] =      [0, 0, 0, 1, 1 ,1, 2, 2, 0,1]

Answer 1

使用Series.cumsum，输出用于GroupBy.cumsum：

df['Event_Master_Out'] = df['Event_Master'].cumsum()
df['Event_B_Out'] = df.groupby('Event_Master_Out')['Event_B'].cumsum()
print (df)
           Time_Stamp  Event_Master  Event_B  Event_Master_Out  Event_B_Out
0 2019-08-16 12:50:00             0        0                 0            0
1 2019-08-16 12:52:00             0        0                 0            0
2 2019-08-16 12:53:00             1        0                 1            0
3 2019-08-16 12:57:00             0        1                 1            1
4 2019-08-16 13:00:00             0        0                 1            1
5 2019-08-16 13:03:00             0        0                 1            1
6 2019-08-16 13:04:00             0        1                 1            2
7 2019-08-16 13:05:00             0        0                 1            2
8 2019-08-16 13:13:00             1        0                 2            0
9 2019-08-16 13:21:00             0        1                 2            1

计算累积总和，其中总和可以通过条件重置

1 个答案: