将一列中的唯一值作为值获取时遇到问题。
我有df
id value1 valueNo
1 12 140
1 13 149
1 11 149
2 11 nan
2 11 150
3 15 145
3 12 149
所需的输出为
id value1 valueNo uniqueNo
1 12 140 140, 149
1 13 149 140, 149
1 11 149 140, 149
2 11 nan 150
2 11 150 150
3 15 145 145, 149
3 12 149 145, 149
我尝试了几种方法,但对我没有用。
df['uniqueNo']=df.groupby(['id'])['valueNo'].apply(lambda x: x.unique())
d['uniqueNo'] = df.groupby(['id'])['valueNo'].apply(list)
答案 0 :(得分:2)
如果缺失值没问题,请在unique
中使用GroupBy.transform
:
df['uniqueNo']=df.groupby(['id'])['valueNo'].transform('unique')
print (df)
id value1 valueNo uniqueNo
0 1 12 140.0 [140.0, 149.0]
1 1 13 149.0 [140.0, 149.0]
2 1 11 149.0 [140.0, 149.0]
3 2 11 NaN [nan, 150.0]
4 2 11 150.0 [nan, 150.0]
5 3 15 145.0 [145.0, 149.0]
6 3 12 149.0 [145.0, 149.0]
如果需要删除它们,解决方案是先删除它们,将unique
和map
汇总到新列:
s = df.dropna(subset=['valueNo'])['valueNo'].astype(int).groupby(df['id']).unique()
#if converting to intgers is not necessary
#s = df.dropna(subset=['valueNo']).groupby('id')['valueNo'].unique()
df['uniqueNo'] = df['id'].map(s)
print (df)
id value1 valueNo uniqueNo
0 1 12 140.0 [140, 149]
1 1 13 149.0 [140, 149]
2 1 11 149.0 [140, 149]
3 2 11 NaN [150]
4 2 11 150.0 [150]
5 3 15 145.0 [145, 149]
6 3 12 149.0 [145, 149]