如何基于大熊猫的出现重新标记id

时间:2017-09-28 10:01:34

标签: python pandas dataframe

这是我的数据

id   Product
3    ye
4    rt
3    re
4    ri
52   rs
34   rd
32   re
34   rd
32   re
3    re

我想使new_id和occurence成为可能,让我更容易在此表中看到出现

id   Product  new_id  occurence
3    ye       1       1
4    rt       2       1
3    re       1       2
4    ri       2       2
52   rs       3       1
34   rd       4       1
32   re       5       1
34   re       4       2
32   re       5       2
3    re       1       3

2 个答案:

答案 0 :(得分:4)

基于dict和groupby cumcount()即

的方法
new = dict(zip(df.drop_duplicates(['id'])['id'],df.reset_index().index+1))
df['new_id'] = df['id'].map(new)

df['occurance'] = df.groupby('id').cumcount()+1
   id Product  occurance  new_id
0   3      ye          1       1
1   4      rt          1       2
2   3      re          2       1
3   4      ri          2       2
4  52      rs          1       3
5  34      rd          1       4
6  32      re          1       5
7  34      rd          2       4
8  32      re          2       5
9   3      re          3       1

答案 1 :(得分:3)

选项1

g = df.groupby('id')
df.assign(new_id=g.ngroup() + 1, occurence=g.cumcount() + 1)

   id Product  new_id  occurence
0   3      ye       1          1
1   4      rt       2          1
2   3      re       1          2
3   4      ri       2          2
4  52      rs       5          1
5  34      rd       4          1
6  32      re       3          1
7  34      rd       4          2
8  32      re       3          2
9   3      re       1          3

选项2

df.assign(
    new_id=df.id.factorize()[0] + 1,
    occurence=df.groupby('id').cumcount() + 1)

   id Product  new_id  occurence
0   3      ye       1          1
1   4      rt       2          1
2   3      re       1          2
3   4      ri       2          2
4  52      rs       3          1
5  34      rd       4          1
6  32      re       5          1
7  34      rd       4          2
8  32      re       5          2
9   3      re       1          3