熊猫为一列中的连续值分配累积计数

时间:2018-12-18 04:53:03

标签: python pandas dataframe group-by pandas-groupby

这是我的数据

print(n0data)

                          FULL_MPID            DateTime  EquipID  count
Index                                                                  
1      5092761672035390000000000000 2018-11-28 00:36:00     1296      1
2      5092761672035390000000000000 2018-11-28 00:37:00     1634      2
3      5092761672035390000000000000 2018-11-28 13:36:00     1296      3
4      5092761672035390000000000000 2018-11-28 13:38:00     1634      4
5      5092761672035390000000000000 2018-11-29 17:37:00     1290      5
6      5092761672035390000000000000 2018-11-29 17:37:00     1634      6
7      5092761672035390000000000000 2018-11-30 21:23:00     1290      7
8      5092761672035390000000000000 2018-11-30 21:24:00     1634      8
9      5092761672035390000000000000 2018-12-02 09:37:00     1296      9
10     5092761672035390000000000000 2018-12-02 09:39:00     1634     10
11     5092761672035390000000000000 2018-12-02 09:39:00     1634     11
12     5092761672035390000000000000 2018-12-03 11:55:00     1290     12
13     5092761672035390000000000000 2018-12-03 12:02:00     1634     13
14     5092761672035390000000000000 2018-12-06 12:22:00     1290     14
15     5092761672035390000000000000 2018-12-06 12:22:00     1634     15
16     5092761672035390000000000000 2018-12-06 12:22:00     1634     16
17     5092761672035390000000000000 2018-12-06 12:23:00     1634     17
18     5092761672035390000000000000 2018-12-06 12:23:00     1634     18
19     5092761672035390000000000000 2018-12-06 12:23:00     1634     19
20     5092761672035390000000000000 2018-12-06 12:23:00     1634     20
21     5092761672035390000000000000 2018-12-06 12:23:00     1634     21
22     5092761672035390000000000000 2018-12-09 05:51:00     1290     22

所以我有一个groupBy函数,该函数使用以下命令创建以下ecount列:

n0data['ecount'] = 
n0data.groupby(['EquipID','FULL_MPID']).cumcount() + 1

数据是按时间排序的,旨在识别EquipID的转换时间。

帐户应该是:

ecount right

当EquipID列的值从一个值更改为另一个值时,应重置ecount。但是,如果EquipID不变,例如在索引15-21行期间,则EquipID应该继续计数。我认为这也是groupBy提供的...

1 个答案:

答案 0 :(得分:1)

您可以在shift之前使用cumsumgroupby技巧:

v = df.EquipID.ne(df.EquipID.shift())
v.groupby(v.cumsum()).cumcount() + 1

Index
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    2
12    1
13    1
14    1
15    1
16    2
17    3
18    4
19    5
20    6
21    7
22    1
dtype: int64