我想使用progressPercentage从1.0变为100的条件,将原始数据帧过滤并拆分为多个数据帧,如下例所示:
输入:
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-04-27 01:35:30,cotton,3.5,A,01:15:00,23.0
id1,2017-04-27 01:37:30,cotton,3.5,B,01:13:00,24.0
id1,2017-04-27 01:38:00,cotton,3.5,B,01:13:00,24.0
id1,2017-04-27 01:38:30,cotton,3.5,C,01:13:00,24.0
id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0
id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0
id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0
id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0
id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0
id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0
id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0
id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0
id2,2017-04-27 03:36:00,cotton,3.5,A,03:15:00,1.0
id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0
id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0
id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0
id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0
id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0
id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0
id1,2017-05-27 01:35:30,cotton,3.5,A,03:15:00,23.0
id1,2017-05-27 01:37:30,cotton,3.5,B,03:13:00,24.0
id1,2017-05-27 01:38:00,cotton,3.5,B,03:13:00,24.0
id1,2017-05-27 01:38:30,cotton,3.5,C,03:13:00,24.0
id1,2017-05-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-05-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-05-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id1,2017-05-27 02:35:30,cotton,3.5,A,01:15:00,1.0
id1,2017-05-27 02:36:00,cotton,3.5,A,01:14:00,2.0
id1,2017-05-27 02:36:30,cotton,3.5,A,01:13:00,2.0
id1,2017-05-27 02:37:00,cotton,3.5,B,01:12:00,3.0
id1,2017-05-27 02:37:30,cotton,3.5,B,01:11:00,4.0
id1,2017-05-27 02:38:00,cotton,3.5,B,01:10:00,5.0
id1,2017-05-27 02:38:30,cotton,3.5,C,01:09:00,98.0
id1,2017-05-27 02:39:00,cotton,3.5,C,00:08:00,99.0
输出:
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-04-27 01:35:30,cotton,3.5,A,01:15:00,23.0
id1,2017-04-27 01:37:30,cotton,3.5,B,01:13:00,24.0
id1,2017-04-27 01:38:00,cotton,3.5,B,01:13:00,24.0
id1,2017-04-27 01:38:30,cotton,3.5,C,01:13:00,24.0
id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0
id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0
id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0
id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0
id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0
id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0
id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0
id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id2,2017-04-27 03:36:00,cotton,3.5,A,03:15:00,1.0
id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0
id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0
id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0
id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0
id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0
id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-05-27 01:35:30,cotton,3.5,A,03:15:00,1.0
id1,2017-05-27 01:37:30,cotton,3.5,B,03:13:00,2.0
id1,2017-05-27 01:38:00,cotton,3.5,B,03:13:00,3.0
id1,2017-05-27 01:38:30,cotton,3.5,C,03:13:00,4.0
id1,2017-05-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-05-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-05-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-05-27 02:35:30,cotton,3.5,A,01:15:00,1.0
id1,2017-05-27 02:36:00,cotton,3.5,A,01:14:00,2.0
id1,2017-05-27 02:36:30,cotton,3.5,A,01:13:00,2.0
id1,2017-05-27 02:37:00,cotton,3.5,B,01:12:00,3.0
id1,2017-05-27 02:37:30,cotton,3.5,B,01:11:00,4.0
id1,2017-05-27 02:38:00,cotton,3.5,B,01:10:00,5.0
id1,2017-05-27 02:38:30,cotton,3.5,C,01:09:00,98.0
id1,2017-05-27 02:39:00,cotton,3.5,C,00:08:00,99.0
id1,2017-05-27 02:39:00,cotton,3.5,C,00:01:00,100.0
我一直在使用.shift()和groupby,如下所示:
a = dfb['Operation.progressPercentage'].shift().eq(100)
grouping = dfb.groupby([dfb.wm_id,a])
但它没有提供预期的结果。 请问有关如何更改代码以完成任务的任何帮助?
非常感谢提前。 最好的祝福, 卡罗
答案 0 :(得分:2)
您可以将progressPercentage等于100除以数据帧。如果它们是连续的,则删除先前的索引。然后将数据帧切片并追加到数组中。希望这有帮助
import numpy as np
df = pd.read_csv('input.csv',delimiter=',') # The input csv provided
df1 = df[(df["progressPercentage "]==100)]
x = (np.array(df1.index) + 1).tolist()
x.insert(0,0)
#Remove the consecutive elements so that they can be treated under one dataframe.
x = [ begin for begin, end in zip(x, x[1:]) if (begin != end-1)]
x.insert(len(x),df.shape[0])
frames = [df.iloc[begin:end] for begin, end in zip(x, x[1:])]
您可以使用for循环打印数据帧,即
for df in frames:
print(df)
数据帧的输出:
id_B ts_B course weight Phase remainingTime \ 0 id1 2017-04-27 01:35:30 cotton 3.5 A 01:15:00 1 id1 2017-04-27 01:37:30 cotton 3.5 B 01:13:00 2 id1 2017-04-27 01:38:00 cotton 3.5 B 01:13:00 3 id1 2017-04-27 01:38:30 cotton 3.5 C 01:13:00 4 id1 2017-04-27 01:39:00 cotton 3.5 C 00:02:00 5 id1 2017-04-27 01:39:30 cotton 3.5 C 00:01:00 6 id1 2017-04-27 01:40:00 cotton 3.5 Finish 00:01:00 progressPercentage 0 23.0 1 24.0 2 24.0 3 24.0 4 99.0 5 100.0 6 100.0 id_B ts_B course weight Phase remainingTime \ 7 id1 2017-04-27 02:35:30 cotton 3.5 A 03:15:00 8 id1 2017-04-27 02:36:00 cotton 3.5 A 03:14:00 9 id1 2017-04-27 02:36:30 cotton 3.5 A 03:14:00 10 id1 2017-04-27 02:37:00 cotton 3.5 B 03:13:00 11 id1 2017-04-27 02:37:30 cotton 3.5 B 03:13:00 12 id1 2017-04-27 02:38:00 cotton 3.5 B 03:13:00 13 id1 2017-04-27 02:38:30 cotton 3.5 C 03:13:00 14 id1 2017-04-27 02:39:00 cotton 3.5 C 00:02:00 15 id1 2017-04-27 02:39:30 cotton 3.5 C 00:01:00 16 id1 2017-04-27 02:40:00 cotton 3.5 Finish 00:01:00 progressPercentage 7 1.0 8 2.0 9 2.0 10 3.0 11 4.0 12 5.0 13 98.0 14 99.0 15 100.0 16 100.0 id_B ts_B course weight Phase remainingTime \ 17 id2 2017-04-27 03:36:00 cotton 3.5 A 03:15:00 18 id2 2017-04-27 03:36:30 cotton 3.5 A 03:14:00 19 id2 2017-04-27 03:37:00 cotton 3.5 B 03:13:00 20 id2 2017-04-27 03:37:30 cotton 3.5 B 03:13:00 21 id2 2017-04-27 03:38:00 cotton 3.5 B 03:13:00 22 id2 2017-04-27 03:38:30 cotton 3.5 C 03:13:00 23 id2 2017-04-27 03:39:00 cotton 3.5 C 00:02:00 24 id2 2017-04-27 03:39:30 cotton 3.5 C 00:01:00 25 id2 2017-04-27 03:40:00 cotton 3.5 Finish 00:01:00 progressPercentage 17 1.0 18 1.0 19 2.0 20 2.0 21 3.0 22 98.0 23 99.0 24 100.0 25 100.0 id_B ts_B course weight Phase remainingTime \ 26 id1 2017-05-27 01:35:30 cotton 3.5 A 03:15:00 27 id1 2017-05-27 01:37:30 cotton 3.5 B 03:13:00 28 id1 2017-05-27 01:38:00 cotton 3.5 B 03:13:00 29 id1 2017-05-27 01:38:30 cotton 3.5 C 03:13:00 30 id1 2017-05-27 01:39:00 cotton 3.5 C 00:02:00 31 id1 2017-05-27 01:39:30 cotton 3.5 C 00:01:00 32 id1 2017-05-27 01:40:00 cotton 3.5 Finish 00:01:00 progressPercentage 26 23.0 27 24.0 28 24.0 29 24.0 30 99.0 31 100.0 32 100.0 id_B ts_B course weight Phase remainingTime \ 33 id1 2017-05-27 02:35:30 cotton 3.5 A 01:15:00 34 id1 2017-05-27 02:36:00 cotton 3.5 A 01:14:00 35 id1 2017-05-27 02:36:30 cotton 3.5 A 01:13:00 36 id1 2017-05-27 02:37:00 cotton 3.5 B 01:12:00 37 id1 2017-05-27 02:37:30 cotton 3.5 B 01:11:00 38 id1 2017-05-27 02:38:00 cotton 3.5 B 01:10:00 39 id1 2017-05-27 02:38:30 cotton 3.5 C 01:09:00 40 id1 2017-05-27 02:39:00 cotton 3.5 C 00:08:00 41 id1 2017-05-27 02:39:00 cotton 3.5 C 00:08:00 progressPercentage 33 1.0 34 2.0 35 2.0 36 3.0 37 4.0 38 5.0 39 98.0 40 99.0 41 100.0 id_B ts_B course weight Phase remainingTime \ 42 id2 2017-04-27 03:36:00 cotton 3.5 A 03:15:00 43 id2 2017-04-27 03:36:30 cotton 3.5 A 03:14:00 44 id2 2017-04-27 03:37:00 cotton 3.5 B 03:13:00 45 id2 2017-04-27 03:37:30 cotton 3.5 B 03:13:00 46 id2 2017-04-27 03:38:00 cotton 3.5 B 03:13:00 47 id2 2017-04-27 03:38:30 cotton 3.5 C 03:13:00 48 id2 2017-04-27 03:39:00 cotton 3.5 C 00:02:00 49 id2 2017-04-27 03:39:30 cotton 3.5 C 00:01:00 50 id2 2017-04-27 03:40:00 cotton 3.5 Finish 00:01:00 progressPercentage 42 1.0 43 1.0 44 2.0 45 2.0 46 3.0 47 98.0 48 99.0 49 100.0 50 100.0
答案 1 :(得分:2)
如果Finish
值有时会丢失,只需使用progressPercentage
列:
shifted = df['progressPercentage'].shift()
#compare difference for second 100 if together 100 (e.g. 15, 16 row)
m = shifted.diff(-1).ne(0) & shifted.eq(100)
a = m.cumsum()
aa = df.groupby([df.id_B,a])
for k, gp in aa:
print('key=' + str(k))
print(gp)
print('A NEW ONE...')
key=('id1', 0)
id_B ts_B course weight Phase remainingTime \
0 id1 2017-04-27 01:35:30 cotton 3.5 A 01:15:00
1 id1 2017-04-27 01:37:30 cotton 3.5 B 01:13:00
2 id1 2017-04-27 01:38:00 cotton 3.5 B 01:13:00
3 id1 2017-04-27 01:38:30 cotton 3.5 C 01:13:00
4 id1 2017-04-27 01:39:00 cotton 3.5 C 00:02:00
5 id1 2017-04-27 01:39:30 cotton 3.5 C 00:01:00
6 id1 2017-04-27 01:40:00 cotton 3.5 Finish 00:01:00
progressPercentage
0 23.0
1 24.0
2 24.0
3 24.0
4 99.0
5 100.0
6 100.0
A NEW ONE...
key=('id1', 1)
id_B ts_B course weight Phase remainingTime \
7 id1 2017-04-27 02:35:30 cotton 3.5 A 03:15:00
8 id1 2017-04-27 02:36:00 cotton 3.5 A 03:14:00
9 id1 2017-04-27 02:36:30 cotton 3.5 A 03:14:00
10 id1 2017-04-27 02:37:00 cotton 3.5 B 03:13:00
11 id1 2017-04-27 02:37:30 cotton 3.5 B 03:13:00
12 id1 2017-04-27 02:38:00 cotton 3.5 B 03:13:00
13 id1 2017-04-27 02:38:30 cotton 3.5 C 03:13:00
14 id1 2017-04-27 02:39:00 cotton 3.5 C 00:02:00
15 id1 2017-04-27 02:39:30 cotton 3.5 C 00:01:00
16 id1 2017-04-27 02:40:00 cotton 3.5 Finish 00:01:00
progressPercentage
7 1.0
8 2.0
9 2.0
10 3.0
11 4.0
12 5.0
13 98.0
14 99.0
15 100.0
16 100.0
A NEW ONE...
key=('id2', 2)
...
答案 2 :(得分:0)
我发现的最佳方式如下:
a = dfb['progressPercentage'].shift().eq(100).cumsum()
df_output = dfb.groupby([dfb.id_B,a])
for k, gp in aa:
print('key=' + str(k))
print(gp.sort_values(['eventTime', 'wm_id'], ascending=[1, 0]).to_string())
print('A NEW ONE...')