如何为重复行生成序列号

时间:2018-02-17 17:19:39

标签: python pandas

我有一个pandas数据帧(df),我必须为重复的行(即具有相似值的行)生成序列号。例如,以下是我的df:

P_Id    Time_Point     Date       
B001    0           2015-07-22
B001    0           2015-07-22
B001    0           2015-07-22
B001    0           2015-07-22
B001    0           2015-07-22
B001    3           2015-10-01
B001    3           2015-10-01
B001    3           2015-10-01
B001    3           2015-10-01
B001    3           2015-10-01
B001    12          2016-08-01
B001    12          2016-08-01
B001    12          2016-08-01
B001    12          2016-08-01
B001    12          2016-08-01

现在,如果您看到有相同ID(001)的重复行,则具有相似的时间点和类似的日期。我想有另一列,每个模式都有一个序列号。生成的df应如下所示:

P_Id    Time_Point     Date        Seq     
B001    0           2015-07-22      1         
B001    0           2015-07-22      2         
B001    0           2015-07-22      3         
B001    0           2015-07-22      4         
B001    0           2015-07-22      5         
B001    3           2015-10-01      1          
B001    3           2015-10-01      2
B001    3           2015-10-01      3
B001    3           2015-10-01      4
B001    12          2016-08-01      1
B001    12          2016-08-01      2
B001    12          2016-08-01      3

1 个答案:

答案 0 :(得分:3)

groupbyGroupBy.cumcountadd标量1一起使用:

df['Seq'] = df.groupby(['P_Id','Time_Point','Date']).cumcount().add(1)
print (df)
    P_Id  Time_Point        Date  Seq
0   B001           0  2015-07-22    1
1   B001           0  2015-07-22    2
2   B001           0  2015-07-22    3
3   B001           0  2015-07-22    4
4   B001           0  2015-07-22    5
5   B001           3  2015-10-01    1
6   B001           3  2015-10-01    2
7   B001           3  2015-10-01    3
8   B001           3  2015-10-01    4
9   B001           3  2015-10-01    5
10  B001          12  2016-08-01    1
11  B001          12  2016-08-01    2
12  B001          12  2016-08-01    3
13  B001          12  2016-08-01    4
14  B001          12  2016-08-01    5