给出类似的数据框df
:
a b
2 nan
3 nan
3 nan
4 nan
4 nan
4 nan
5 nan
5 nan
5 nan
5 nan
...
关键规则是n
中的每个数字a
重复n-1
行。我的预期输出是:
a b
2 1
3 1
3 2
4 1
4 2
4 3
5 1
5 2
5 3
5 4
...
因此,m
中的b
数字是从1
到n-1
的列表。我用这种方式试了一下:
df.groupby('a').apply(lambda x: np.asarray(range(x['a'].unique()[0])))
但结果是一行中的列表,这不是我想要的。
你能告诉我如何实施它吗?提前谢谢!
答案 0 :(得分:3)
您需要cumcount
:
df['b'] = df.groupby('a').cumcount() + 1
print (df)
a b
0 2 1
1 3 1
2 3 2
3 4 1
4 4 2
5 4 3
6 5 1
7 5 2
8 5 3
9 5 4
答案 1 :(得分:1)
# make a column that is 0 on the first occurrence of a number in a and 1 after
df['is_duplicated'] = df.duplicated(['a']).astype(int)
# group by values of a and get the cumulative sum of duplicates
# add one since the first duplicate has a value of 0
df['b'] = df[['a', 'is_duplicated']].groupby(['a']).cumsum() + 1