如何将pandas数据框的某些文本列的唯一值合并到单个列中。 例如:
data = [[1,"US","California","Los Angeles"],
[1,"US","California","San Francisco"],
[1,"US","California","San Diego"],
[1,"US","Texas","Austin"],
[2,"IND","Maharashtra","Mumbai"],
[2,"IND","Maharashtra","Pune"],
[2,"IND","Maharashtra","Nagpur"]]
df = pd.DataFrame(data, columns = ['Country_Id', 'Country','State','Place'])
如何从上面的数据帧生成输出,其中一个字段为Country_Id
,第二个字段为包含唯一值Country
,State
,Place
的文本字段。
赞:
请忽略组合文本字段的含义
答案 0 :(得分:2)
在groupby
和genexp上将apply
和join
与双unique
一起使用
df.groupby('Country_Id').apply(lambda x: ' '.join(' '.join(x[col].unique()) for col in x))
.to_frame('Country-State-Place')
Out[434]:
Country-State-Place
Country_Id
1 US California Texas Los Angeles San Francisco San Diego Austin
2 IND Maharashtra Mumbai Pune Nagpur