我有以下类型的数据框:
Case_Key int64
Activity object
Timestamp datetime64[ns]
Vendor object
Plant object
Country object
City object
Net_Order_Value float64
Order_Queantity float64
Time_Difference timedelta64[ns]
dtype: object
这个结构:
Case_Key Timestamp Time_Difference
0 1000 2016-01-01 08:35:07 0 days
5 1000 2016-01-20 08:35:07 8 days
6 1000 2016-01-26 08:35:07 6 days
7 1000 2016-02-09 08:35:07 14 days
8 10000 2016-01-26 11:57:47 0 days
11 10000 2016-02-05 11:57:47 7 days
12 10000 2016-02-11 11:57:47 6 days
13 10000 2016-02-26 11:57:47 15 days
14 100000 2016-10-13 10:00:01 0 days
17 100000 2016-10-26 10:00:01 9 days
我要实现的示例:以第一个时间戳记(2016-01-01)从第二行添加Time_Difference(8天),并将第二行的时间戳记替换为(2016-01-09) 。然后对按Case_Key列分组的下一行执行相同操作,采用上一个时间戳(2016-01-09),然后添加6天,期望结果为(2016-01-15)。
所需结果:
0 1000 2016-01-01 08:35:07 0 days
5 1000 2016-01-**09** 08:35:07 8 days
6 1000 2016-01-**15** 08:35:07 6 days
7 1000 2016-02-**29** 08:35:07 14 days
并且希望对按相同Case_Key分组的每一行执行此操作
答案 0 :(得分:0)
使用GroupBy.transform
对每列Timestamp
的组重复第一个值,然后使用Series.add
乘以时间增量的累积和乘以Series.cumsum
的累积和:
s = df.groupby('Case_Key')['Time_Difference'].apply(pd.Series.cumsum)
#because failed with timedeltas
#s = df.groupby('Case_Key')['Time_Difference'].cumsum()
df['Timestamp'] = df.groupby('Case_Key')['Timestamp'].transform('first').add(s)
print (df)
Case_Key Timestamp Time_Difference
0 1000 2016-01-01 08:35:07 0 days
5 1000 2016-01-09 08:35:07 8 days
6 1000 2016-01-15 08:35:07 6 days
7 1000 2016-01-29 08:35:07 14 days
8 10000 2016-01-26 11:57:47 0 days
11 10000 2016-02-02 11:57:47 7 days
12 10000 2016-02-08 11:57:47 6 days
13 10000 2016-02-23 11:57:47 15 days
14 100000 2016-10-13 10:00:01 0 days
17 100000 2016-10-22 10:00:01 9 days