给出以下数据集DF:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:39,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:49,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:18,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:49,P,5.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:54:27,P,5.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:07,P,6.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:27,P,6.0,01:53:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:33:46,W,40.0,01:13:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:10,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:16,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:18,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:55,P,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:15,P,1.0,01:59:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:31,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:51,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:22,P,4.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:51,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:22,S,98.0,00:04:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:27,S,98.0,00:03:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:30:27,S,99.0,00:02:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:31:27,S,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
我想分成两部分:
DF1:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:39,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:49,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:18,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:49,P,5.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:54:27,P,5.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:07,P,6.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:27,P,6.0,01:53:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:33:46,W,40.0,01:13:00
df2:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:10,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:16,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:18,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:55,P,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:15,P,1.0,01:59:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:31,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:51,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:22,P,4.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:51,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:22,S,98.0,00:04:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:27,S,98.0,00:03:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:30:27,S,99.0,00:02:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:31:27,S,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
拆分应基于Op.progressPercentage属性,该属性可以采用1到100之间的值。
当我尝试应用splitting a pandas Dataframe提供的解决方案时,如下所示,我没有得到正确的预期结果。
df_dataset = pd.read_csv(filepath) #your input data saved here
wash_list = []
shifted = df_dataset['Op.progressPercentage'].shift()
m = shifted.diff(-1).ne(0) & shifted.eq(100)
a = m.cumsum()
aa = df_dataset.groupby([df_dataset.uuid,a])
for k, gp in aa:
wash_list.append(gp.sort_values(['uuid', 'eventTime'], ascending=[1, 1]))
for wash in wash_list :
print("")
print(wash.to_string())
print("")
请,任何帮助将非常感谢。 非常感谢你提前, 最好的祝福, 卡罗
答案 0 :(得分:3)
IIUC,(不考虑异常情况)您可以使用diff
+ cumsum
获取不同的群组,groupby
代表这些群组:
for _, g in df.groupby((~df['Op.progressPercentage']\
.diff().fillna(0).ge(0)).cumsum()):
print(g, '\n')
<强>详情
这些小组是这样的:
(~df['Op.progressPercentage'].diff().fillna(0).ge(0)).cumsum()
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
Name: Op.progressPercentage, dtype: int64
答案 1 :(得分:2)
np.diff
计算下一个值与当前值之间的差异。d < 0
显示值下降的位置np.flatnonzero
找到非零值的位置。在我们的例子中,True
值np.diff
从源数组中删除了一个元素,我添加1以获得正确的位置。np.split
将df
分隔为负数diff
d = np.diff(df['Op.progressPercentage'].values)
results = np.split(df, np.flatnonzero(d < 0) + 1)
print(*results, sep='\n' * 2)
uuid eventTime Op.progress Op.progressPercentage AnotherAttribute
0 C0972765-8436-0000-0000-000000000000 2017-08-19T12:52:39 P 3.0 01:57:00
1 C0972765-8436-0000-0000-000000000000 2017-08-19T12:52:49 P 3.0 01:56:00
2 C0972765-8436-0000-0000-000000000000 2017-08-19T12:53:18 P 4.0 01:55:00
3 C0972765-8436-0000-0000-000000000000 2017-08-19T12:53:49 P 5.0 01:55:00
4 C0972765-8436-0000-0000-000000000000 2017-08-19T12:54:27 P 5.0 01:54:00
5 C0972765-8436-0000-0000-000000000000 2017-08-19T12:55:07 P 6.0 01:54:00
6 C0972765-8436-0000-0000-000000000000 2017-08-19T12:55:27 P 6.0 01:53:00
7 C0972765-8436-0000-0000-000000000000 2017-08-19T13:33:46 W 40.0 01:13:00
uuid eventTime Op.progress Op.progressPercentage AnotherAttribute
8 C0972765-8436-0000-0000-000000000000 2017-08-19T13:40:10 N 1.0 02:00:00
9 C0972765-8436-0000-0000-000000000000 2017-08-19T13:40:16 N 1.0 02:00:00
10 C0972765-8436-0000-0000-000000000000 2017-08-19T13:40:18 N 1.0 02:00:00
11 C0972765-8436-0000-0000-000000000000 2017-08-19T13:40:55 P 1.0 02:00:00
12 C0972765-8436-0000-0000-000000000000 2017-08-19T13:41:15 P 1.0 01:59:00
13 C0972765-8436-0000-0000-000000000000 2017-08-19T13:41:31 P 3.0 01:57:00
14 C0972765-8436-0000-0000-000000000000 2017-08-19T13:41:51 P 3.0 01:56:00
15 C0972765-8436-0000-0000-000000000000 2017-08-19T13:42:22 P 4.0 01:56:00
16 C0972765-8436-0000-0000-000000000000 2017-08-19T13:42:51 P 4.0 01:55:00
17 C0972765-8436-0000-0000-000000000000 2017-08-19T15:29:22 S 98.0 00:04:00
18 C0972765-8436-0000-0000-000000000000 2017-08-19T15:29:27 S 98.0 00:03:00
19 C0972765-8436-0000-0000-000000000000 2017-08-19T15:30:27 S 99.0 00:02:00
20 C0972765-8436-0000-0000-000000000000 2017-08-19T15:31:27 S 100.0 00:01:00
21 C0972765-8436-0000-0000-000000000000 2017-08-19T15:33:01 F 100.0 00:01:00
22 C0972765-8436-0000-0000-000000000000 2017-08-19T15:33:01 F 100.0 00:01:00