将每个组中的字符串连接起来并分配回原始的DataFrame

时间:2018-12-12 16:17:19

标签: python pandas dataframe group-by pandas-groupby

我的数据框有两列:userlang。每个用户都知道一种或多种语言:

     lang     user
0  Python     Mike
1   Scala     Mike
2       R     John
3   Julia  Michael
4    Java  Michael

我需要为user中的每一行获取他/她知道的所有语言。我可以做到:

df.groupby('user')['lang'].apply(lambda x:', '.join(x)).reset_index()

但是我明白了:

      user           lang
0     John              R
1  Michael    Julia, Java
2     Mike  Python, Scala

不是我想要的:

           lang     user
0  Python,Scala     Mike
1  Python,Scala     Mike
2             R     John
3    Julia,Java  Michael
4    Julia,Java  Michael

要复制的代码:

import pandas as pd

df = pd.DataFrame({"lang":["Python","Scala","R","Julia","Java"],
                   "user":["Mike","Mike","John","Michael","Michael"]})
print(df)

1 个答案:

答案 0 :(得分:4)

使用transformgroupby结果“广播”到输入的每一行。

df['lang'] = df.groupby('user')['lang'].transform(', '.join)
df
            lang     user
0  Python, Scala     Mike
1  Python, Scala     Mike
2              R     John
3    Julia, Java  Michael
4    Julia, Java  Michael