如何分组并将数组分配给python-pandas中的列?

时间:2016-09-28 13:55:48

标签: python pandas numpy dataframe

给出类似的数据框df

a     b    
2     nan
3     nan
3     nan
4     nan
4     nan
4     nan 
5     nan
5     nan 
5     nan
5     nan
...

关键规则是n中的每个数字a重复n-1行。我的预期输出是:

a     b    
2     1
3     1
3     2
4     1
4     2
4     3
5     1
5     2
5     3
5     4
...

因此,m中的b数字是从1n-1的列表。我用这种方式试了一下:

df.groupby('a').apply(lambda x: np.asarray(range(x['a'].unique()[0]))) 

但结果是一行中的列表,这不是我想要的。

你能告诉我如何实施它吗?提前谢谢!

2 个答案:

答案 0 :(得分:3)

您需要cumcount

df['b'] = df.groupby('a').cumcount() + 1
print (df)
   a  b
0  2  1
1  3  1
2  3  2
3  4  1
4  4  2
5  4  3
6  5  1
7  5  2
8  5  3
9  5  4

答案 1 :(得分:1)

# make a column that is 0 on the first occurrence of a number in a and 1 after
df['is_duplicated'] = df.duplicated(['a']).astype(int)

# group by values of a and get the cumulative sum of duplicates
# add one since the first duplicate has a value of 0
df['b'] = df[['a', 'is_duplicated']].groupby(['a']).cumsum() + 1