将唯一值从组存储到Pandas中的另一列

时间:2019-06-06 10:30:05

标签: python pandas group-by unique

将一列中的唯一值作为值获取时遇到问题。

我有df

id  value1 valueNo
1    12      140
1    13      149
1    11      149
2    11      nan
2    11      150
3    15      145
3    12      149

所需的输出为

id  value1 valueNo   uniqueNo
1    12      140      140, 149
1    13      149      140, 149
1    11      149      140, 149
2    11      nan      150
2    11      150      150
3    15      145      145, 149
3    12      149      145, 149

我尝试了几种方法,但对我没有用。

df['uniqueNo']=df.groupby(['id'])['valueNo'].apply(lambda x: x.unique())
d['uniqueNo'] = df.groupby(['id'])['valueNo'].apply(list)

1 个答案:

答案 0 :(得分:2)

如果缺失值没问题,请在unique中使用GroupBy.transform

df['uniqueNo']=df.groupby(['id'])['valueNo'].transform('unique')
print (df)
   id  value1  valueNo        uniqueNo
0   1      12    140.0  [140.0, 149.0]
1   1      13    149.0  [140.0, 149.0]
2   1      11    149.0  [140.0, 149.0]
3   2      11      NaN    [nan, 150.0]
4   2      11    150.0    [nan, 150.0]
5   3      15    145.0  [145.0, 149.0]
6   3      12    149.0  [145.0, 149.0]

如果需要删除它们,解决方案是先删除它们,将uniquemap汇总到新列:

s = df.dropna(subset=['valueNo'])['valueNo'].astype(int).groupby(df['id']).unique()
#if converting to intgers is not necessary
#s = df.dropna(subset=['valueNo']).groupby('id')['valueNo'].unique()
df['uniqueNo'] = df['id'].map(s)
print (df)
   id  value1  valueNo    uniqueNo
0   1      12    140.0  [140, 149]
1   1      13    149.0  [140, 149]
2   1      11    149.0  [140, 149]
3   2      11      NaN       [150]
4   2      11    150.0       [150]
5   3      15    145.0  [145, 149]
6   3      12    149.0  [145, 149]