我的数据框有两列:user
和lang
。每个用户都知道一种或多种语言:
lang user
0 Python Mike
1 Scala Mike
2 R John
3 Julia Michael
4 Java Michael
我需要为user
中的每一行获取他/她知道的所有语言。我可以做到:
df.groupby('user')['lang'].apply(lambda x:', '.join(x)).reset_index()
但是我明白了:
user lang
0 John R
1 Michael Julia, Java
2 Mike Python, Scala
不是我想要的:
lang user
0 Python,Scala Mike
1 Python,Scala Mike
2 R John
3 Julia,Java Michael
4 Julia,Java Michael
要复制的代码:
import pandas as pd
df = pd.DataFrame({"lang":["Python","Scala","R","Julia","Java"],
"user":["Mike","Mike","John","Michael","Michael"]})
print(df)
答案 0 :(得分:4)
使用transform
将groupby
结果“广播”到输入的每一行。
df['lang'] = df.groupby('user')['lang'].transform(', '.join)
df
lang user
0 Python, Scala Mike
1 Python, Scala Mike
2 R John
3 Julia, Java Michael
4 Julia, Java Michael