这是我的数据
print(n0data)
FULL_MPID DateTime EquipID count
Index
1 5092761672035390000000000000 2018-11-28 00:36:00 1296 1
2 5092761672035390000000000000 2018-11-28 00:37:00 1634 2
3 5092761672035390000000000000 2018-11-28 13:36:00 1296 3
4 5092761672035390000000000000 2018-11-28 13:38:00 1634 4
5 5092761672035390000000000000 2018-11-29 17:37:00 1290 5
6 5092761672035390000000000000 2018-11-29 17:37:00 1634 6
7 5092761672035390000000000000 2018-11-30 21:23:00 1290 7
8 5092761672035390000000000000 2018-11-30 21:24:00 1634 8
9 5092761672035390000000000000 2018-12-02 09:37:00 1296 9
10 5092761672035390000000000000 2018-12-02 09:39:00 1634 10
11 5092761672035390000000000000 2018-12-02 09:39:00 1634 11
12 5092761672035390000000000000 2018-12-03 11:55:00 1290 12
13 5092761672035390000000000000 2018-12-03 12:02:00 1634 13
14 5092761672035390000000000000 2018-12-06 12:22:00 1290 14
15 5092761672035390000000000000 2018-12-06 12:22:00 1634 15
16 5092761672035390000000000000 2018-12-06 12:22:00 1634 16
17 5092761672035390000000000000 2018-12-06 12:23:00 1634 17
18 5092761672035390000000000000 2018-12-06 12:23:00 1634 18
19 5092761672035390000000000000 2018-12-06 12:23:00 1634 19
20 5092761672035390000000000000 2018-12-06 12:23:00 1634 20
21 5092761672035390000000000000 2018-12-06 12:23:00 1634 21
22 5092761672035390000000000000 2018-12-09 05:51:00 1290 22
所以我有一个groupBy函数,该函数使用以下命令创建以下ecount列:
n0data['ecount'] =
n0data.groupby(['EquipID','FULL_MPID']).cumcount() + 1
数据是按时间排序的,旨在识别EquipID的转换时间。
帐户应该是:
当EquipID列的值从一个值更改为另一个值时,应重置ecount。但是,如果EquipID不变,例如在索引15-21行期间,则EquipID应该继续计数。我认为这也是groupBy提供的...
答案 0 :(得分:1)
您可以在shift
之前使用cumsum
和groupby
技巧:
v = df.EquipID.ne(df.EquipID.shift())
v.groupby(v.cumsum()).cumcount() + 1
Index
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 2
12 1
13 1
14 1
15 1
16 2
17 3
18 4
19 5
20 6
21 7
22 1
dtype: int64