这是我的数据
id Product
3 ye
4 rt
3 re
4 ri
52 rs
34 rd
32 re
34 rd
32 re
3 re
我想使new_id和occurence成为可能,让我更容易在此表中看到出现
id Product new_id occurence
3 ye 1 1
4 rt 2 1
3 re 1 2
4 ri 2 2
52 rs 3 1
34 rd 4 1
32 re 5 1
34 re 4 2
32 re 5 2
3 re 1 3
答案 0 :(得分:4)
基于dict和groupby cumcount()即
的方法new = dict(zip(df.drop_duplicates(['id'])['id'],df.reset_index().index+1))
df['new_id'] = df['id'].map(new)
df['occurance'] = df.groupby('id').cumcount()+1
id Product occurance new_id 0 3 ye 1 1 1 4 rt 1 2 2 3 re 2 1 3 4 ri 2 2 4 52 rs 1 3 5 34 rd 1 4 6 32 re 1 5 7 34 rd 2 4 8 32 re 2 5 9 3 re 3 1
答案 1 :(得分:3)
选项1
g = df.groupby('id')
df.assign(new_id=g.ngroup() + 1, occurence=g.cumcount() + 1)
id Product new_id occurence
0 3 ye 1 1
1 4 rt 2 1
2 3 re 1 2
3 4 ri 2 2
4 52 rs 5 1
5 34 rd 4 1
6 32 re 3 1
7 34 rd 4 2
8 32 re 3 2
9 3 re 1 3
选项2
df.assign(
new_id=df.id.factorize()[0] + 1,
occurence=df.groupby('id').cumcount() + 1)
id Product new_id occurence
0 3 ye 1 1
1 4 rt 2 1
2 3 re 1 2
3 4 ri 2 2
4 52 rs 3 1
5 34 rd 4 1
6 32 re 5 1
7 34 rd 4 2
8 32 re 5 2
9 3 re 1 3