如何在熊猫数据框中合并多个文本字段

时间:2019-09-03 16:36:44

标签: python pandas

如何将pandas数据框的某些文本列的唯一值合并到单个列中。 例如:

data = [[1,"US","California","Los Angeles"],
        [1,"US","California","San Francisco"],
        [1,"US","California","San Diego"],
        [1,"US","Texas","Austin"],
        [2,"IND","Maharashtra","Mumbai"],
        [2,"IND","Maharashtra","Pune"],
        [2,"IND","Maharashtra","Nagpur"]]

df = pd.DataFrame(data, columns = ['Country_Id', 'Country','State','Place'])

如何从上面的数据帧生成输出,其中一个字段为Country_Id,第二个字段为包含唯一值CountryStatePlace的文本字段。

赞:

  • 1,美国加利福尼亚州德克萨斯州洛杉矶市旧金山圣地亚哥市奥斯汀
  • 2,IND马哈拉施特拉邦孟买那格浦尔

请忽略组合文本字段的含义

1 个答案:

答案 0 :(得分:2)

groupby和genexp上将applyjoin与双unique一起使用

df.groupby('Country_Id').apply(lambda x: ' '.join(' '.join(x[col].unique()) for col in x))
                        .to_frame('Country-State-Place')


Out[434]:
                                                       Country-State-Place
Country_Id
1           US California Texas Los Angeles San Francisco San Diego Austin
2           IND Maharashtra Mumbai Pune Nagpur