添加额外列作为累积时差:复杂案例

时间:2017-07-24 09:56:07

标签: python pandas dataframe

请,我想添加一个额外的列作为累积时差(ts_B aatribute),如下面的数据集所示:

df1输入:

id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-04-27 01:35:30,cotton,3.5,A,03:15:00,23.0
id1,2017-04-27 01:36:00,cotton,3.5,A,03:14:00,23.0  
id1,2017-04-27 01:36:30,cotton,3.5,A,03:14:00,24.0 
id1,2017-04-27 01:37:00,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:37:30,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:38:00,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:38:30,cotton,3.5,C,03:13:00,24.0
id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0
id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0  
id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0 
id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0
id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0
id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0
id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0
id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0
id2,2017-04-27 03:36:00,cotton,3.5,A,03:14:00,1.0
id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0 
id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0
id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0
id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0
id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0
id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0

df输出:

 id_B,ts_B,course,weight,Phase,remainingTime,progressPercentage,cum_delta_sec
 id1,2017-04-27 01:35:30,cotton,3.5,A,03:15:00,23.0,                0
 id1,2017-04-27 01:36:00,cotton,3.5,A,03:14:00,23.0,                30  
 id1,2017-04-27 01:36:30,cotton,3.5,A,03:14:00,24.0,                60 
 id1,2017-04-27 01:37:00,cotton,3.5,B,03:13:00,24.0,                90
 id1,2017-04-27 01:37:30,cotton,3.5,B,03:13:00,24.0,                120
 id1,2017-04-27 01:38:00,cotton,3.5,B,03:13:00,24.0,                150
 id1,2017-04-27 01:38:30,cotton,3.5,C,03:13:00,24.0,                180
 id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0,                210
 id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0,               240
 id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0,          270
 id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0,                 0
 id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0,                 30  
 id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0,                 60 
 id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0,                 90
 id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0,                 120
 id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0,                 150
 id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0,                180
 id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0,                210
 id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0,               240
 id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0,          270
 id2,2017-04-27 03:36:00,cotton,3.5,A,03:14:00,1.0,                 0  
 id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0,                 30 
 id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0,                 60
 id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0,                 90
 id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0,                 120
 id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0,                150
 id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0,                180
 id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0,               210
 id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0,          240

Add extra column as the cumulative time difference提供的解决方案有点不同,因为识别完整实验的关键不仅仅基于id。我需要插入id和事实,即每个id," progressPercentage"从x> 0到100.

请对此有何帮助? 提前感谢您的支持。 最好的祝福, 卡罗

1 个答案:

答案 0 :(得分:1)

您可以shift使用comapre somee边缘值,例如Finish列中的Phasecumsum作为新Series进行分组:

df.ts_B = pd.to_datetime(df.ts_B)
a = df['Phase'].shift().eq('Finish').cumsum()

df['cum_delta_sec'] = df.ts_B.astype(np.int64).div(10**9)
                       .groupby([df.id_B,a])
                       .transform(lambda x: x.diff().fillna(0).cumsum())
print (df)
   id_B                ts_B  course  weight   Phase remainingTime  \
0   id1 2017-04-27 01:35:30  cotton     3.5       A      03:15:00   
1   id1 2017-04-27 01:36:00  cotton     3.5       A      03:14:00   
2   id1 2017-04-27 01:36:30  cotton     3.5       A      03:14:00   
3   id1 2017-04-27 01:37:00  cotton     3.5       B      03:13:00   
4   id1 2017-04-27 01:37:30  cotton     3.5       B      03:13:00   
5   id1 2017-04-27 01:38:00  cotton     3.5       B      03:13:00   
6   id1 2017-04-27 01:38:30  cotton     3.5       C      03:13:00   
7   id1 2017-04-27 01:39:00  cotton     3.5       C      00:02:00   
8   id1 2017-04-27 01:39:30  cotton     3.5       C      00:01:00   
9   id1 2017-04-27 01:40:00  cotton     3.5  Finish      00:01:00   
10  id1 2017-04-27 02:35:30  cotton     3.5       A      03:15:00   
11  id1 2017-04-27 02:36:00  cotton     3.5       A      03:14:00   
12  id1 2017-04-27 02:36:30  cotton     3.5       A      03:14:00   
13  id1 2017-04-27 02:37:00  cotton     3.5       B      03:13:00   
14  id1 2017-04-27 02:37:30  cotton     3.5       B      03:13:00   
15  id1 2017-04-27 02:38:00  cotton     3.5       B      03:13:00   
16  id1 2017-04-27 02:38:30  cotton     3.5       C      03:13:00   
17  id1 2017-04-27 02:39:00  cotton     3.5       C      00:02:00   
18  id1 2017-04-27 02:39:30  cotton     3.5       C      00:01:00   
19  id1 2017-04-27 02:40:00  cotton     3.5  Finish      00:01:00   
20  id2 2017-04-27 03:36:00  cotton     3.5       A      03:14:00   
21  id2 2017-04-27 03:36:30  cotton     3.5       A      03:14:00   
22  id2 2017-04-27 03:37:00  cotton     3.5       B      03:13:00   
23  id2 2017-04-27 03:37:30  cotton     3.5       B      03:13:00   
24  id2 2017-04-27 03:38:00  cotton     3.5       B      03:13:00   
25  id2 2017-04-27 03:38:30  cotton     3.5       C      03:13:00   
26  id2 2017-04-27 03:39:00  cotton     3.5       C      00:02:00   
27  id2 2017-04-27 03:39:30  cotton     3.5       C      00:01:00   
28  id2 2017-04-27 03:40:00  cotton     3.5  Finish      00:01:00   

    progressPercentage  cum_delta_sec  
0                 23.0            0.0  
1                 23.0           30.0  
2                 24.0           60.0  
3                 24.0           90.0  
4                 24.0          120.0  
5                 24.0          150.0  
6                 24.0          180.0  
7                 99.0          210.0  
8                100.0          240.0  
9                100.0          270.0  
10                 1.0            0.0  
11                 2.0           30.0  
12                 2.0           60.0  
13                 3.0           90.0  
14                 4.0          120.0  
15                 5.0          150.0  
16                98.0          180.0  
17                99.0          210.0  
18               100.0          240.0  
19               100.0          270.0  
20                 1.0            0.0  
21                 1.0           30.0  
22                 2.0           60.0  
23                 2.0           90.0  
24                 3.0          120.0  
25                98.0          150.0  
26                99.0          180.0  
27               100.0          210.0  
28               100.0          240.0