我有一个pandas数据帧(df),我必须为重复的行(即具有相似值的行)生成序列号。例如,以下是我的df:
P_Id Time_Point Date
B001 0 2015-07-22
B001 0 2015-07-22
B001 0 2015-07-22
B001 0 2015-07-22
B001 0 2015-07-22
B001 3 2015-10-01
B001 3 2015-10-01
B001 3 2015-10-01
B001 3 2015-10-01
B001 3 2015-10-01
B001 12 2016-08-01
B001 12 2016-08-01
B001 12 2016-08-01
B001 12 2016-08-01
B001 12 2016-08-01
现在,如果您看到有相同ID(001)的重复行,则具有相似的时间点和类似的日期。我想有另一列,每个模式都有一个序列号。生成的df应如下所示:
P_Id Time_Point Date Seq
B001 0 2015-07-22 1
B001 0 2015-07-22 2
B001 0 2015-07-22 3
B001 0 2015-07-22 4
B001 0 2015-07-22 5
B001 3 2015-10-01 1
B001 3 2015-10-01 2
B001 3 2015-10-01 3
B001 3 2015-10-01 4
B001 12 2016-08-01 1
B001 12 2016-08-01 2
B001 12 2016-08-01 3
答案 0 :(得分:3)
将groupby
与GroupBy.cumcount
和add
标量1
一起使用:
df['Seq'] = df.groupby(['P_Id','Time_Point','Date']).cumcount().add(1)
print (df)
P_Id Time_Point Date Seq
0 B001 0 2015-07-22 1
1 B001 0 2015-07-22 2
2 B001 0 2015-07-22 3
3 B001 0 2015-07-22 4
4 B001 0 2015-07-22 5
5 B001 3 2015-10-01 1
6 B001 3 2015-10-01 2
7 B001 3 2015-10-01 3
8 B001 3 2015-10-01 4
9 B001 3 2015-10-01 5
10 B001 12 2016-08-01 1
11 B001 12 2016-08-01 2
12 B001 12 2016-08-01 3
13 B001 12 2016-08-01 4
14 B001 12 2016-08-01 5