请,我想添加一个额外的列作为累积时差(ts_B aatribute),如下面的数据集所示:
df1输入:
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-04-27 01:35:30,cotton,3.5,A,03:15:00,23.0
id1,2017-04-27 01:36:00,cotton,3.5,A,03:14:00,23.0
id1,2017-04-27 01:36:30,cotton,3.5,A,03:14:00,24.0
id1,2017-04-27 01:37:00,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:37:30,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:38:00,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:38:30,cotton,3.5,C,03:13:00,24.0
id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0
id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0
id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0
id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0
id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0
id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0
id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0
id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0
id2,2017-04-27 03:36:00,cotton,3.5,A,03:14:00,1.0
id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0
id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0
id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0
id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0
id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0
id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0
df输出:
id_B,ts_B,course,weight,Phase,remainingTime,progressPercentage,cum_delta_sec
id1,2017-04-27 01:35:30,cotton,3.5,A,03:15:00,23.0, 0
id1,2017-04-27 01:36:00,cotton,3.5,A,03:14:00,23.0, 30
id1,2017-04-27 01:36:30,cotton,3.5,A,03:14:00,24.0, 60
id1,2017-04-27 01:37:00,cotton,3.5,B,03:13:00,24.0, 90
id1,2017-04-27 01:37:30,cotton,3.5,B,03:13:00,24.0, 120
id1,2017-04-27 01:38:00,cotton,3.5,B,03:13:00,24.0, 150
id1,2017-04-27 01:38:30,cotton,3.5,C,03:13:00,24.0, 180
id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0, 210
id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0, 240
id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0, 270
id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0, 0
id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0, 30
id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0, 60
id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0, 90
id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0, 120
id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0, 150
id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0, 180
id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0, 210
id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0, 240
id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0, 270
id2,2017-04-27 03:36:00,cotton,3.5,A,03:14:00,1.0, 0
id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0, 30
id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0, 60
id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0, 90
id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0, 120
id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0, 150
id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0, 180
id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0, 210
id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0, 240
Add extra column as the cumulative time difference提供的解决方案有点不同,因为识别完整实验的关键不仅仅基于id。我需要插入id和事实,即每个id," progressPercentage"从x> 0到100.
请对此有何帮助? 提前感谢您的支持。 最好的祝福, 卡罗
答案 0 :(得分:1)
您可以shift
使用comapre somee边缘值,例如Finish
列中的Phase
,cumsum
作为新Series
进行分组:
df.ts_B = pd.to_datetime(df.ts_B)
a = df['Phase'].shift().eq('Finish').cumsum()
df['cum_delta_sec'] = df.ts_B.astype(np.int64).div(10**9)
.groupby([df.id_B,a])
.transform(lambda x: x.diff().fillna(0).cumsum())
print (df)
id_B ts_B course weight Phase remainingTime \
0 id1 2017-04-27 01:35:30 cotton 3.5 A 03:15:00
1 id1 2017-04-27 01:36:00 cotton 3.5 A 03:14:00
2 id1 2017-04-27 01:36:30 cotton 3.5 A 03:14:00
3 id1 2017-04-27 01:37:00 cotton 3.5 B 03:13:00
4 id1 2017-04-27 01:37:30 cotton 3.5 B 03:13:00
5 id1 2017-04-27 01:38:00 cotton 3.5 B 03:13:00
6 id1 2017-04-27 01:38:30 cotton 3.5 C 03:13:00
7 id1 2017-04-27 01:39:00 cotton 3.5 C 00:02:00
8 id1 2017-04-27 01:39:30 cotton 3.5 C 00:01:00
9 id1 2017-04-27 01:40:00 cotton 3.5 Finish 00:01:00
10 id1 2017-04-27 02:35:30 cotton 3.5 A 03:15:00
11 id1 2017-04-27 02:36:00 cotton 3.5 A 03:14:00
12 id1 2017-04-27 02:36:30 cotton 3.5 A 03:14:00
13 id1 2017-04-27 02:37:00 cotton 3.5 B 03:13:00
14 id1 2017-04-27 02:37:30 cotton 3.5 B 03:13:00
15 id1 2017-04-27 02:38:00 cotton 3.5 B 03:13:00
16 id1 2017-04-27 02:38:30 cotton 3.5 C 03:13:00
17 id1 2017-04-27 02:39:00 cotton 3.5 C 00:02:00
18 id1 2017-04-27 02:39:30 cotton 3.5 C 00:01:00
19 id1 2017-04-27 02:40:00 cotton 3.5 Finish 00:01:00
20 id2 2017-04-27 03:36:00 cotton 3.5 A 03:14:00
21 id2 2017-04-27 03:36:30 cotton 3.5 A 03:14:00
22 id2 2017-04-27 03:37:00 cotton 3.5 B 03:13:00
23 id2 2017-04-27 03:37:30 cotton 3.5 B 03:13:00
24 id2 2017-04-27 03:38:00 cotton 3.5 B 03:13:00
25 id2 2017-04-27 03:38:30 cotton 3.5 C 03:13:00
26 id2 2017-04-27 03:39:00 cotton 3.5 C 00:02:00
27 id2 2017-04-27 03:39:30 cotton 3.5 C 00:01:00
28 id2 2017-04-27 03:40:00 cotton 3.5 Finish 00:01:00
progressPercentage cum_delta_sec
0 23.0 0.0
1 23.0 30.0
2 24.0 60.0
3 24.0 90.0
4 24.0 120.0
5 24.0 150.0
6 24.0 180.0
7 99.0 210.0
8 100.0 240.0
9 100.0 270.0
10 1.0 0.0
11 2.0 30.0
12 2.0 60.0
13 3.0 90.0
14 4.0 120.0
15 5.0 150.0
16 98.0 180.0
17 99.0 210.0
18 100.0 240.0
19 100.0 270.0
20 1.0 0.0
21 1.0 30.0
22 2.0 60.0
23 2.0 90.0
24 3.0 120.0
25 98.0 150.0
26 99.0 180.0
27 100.0 210.0
28 100.0 240.0