考虑以下数据框
df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
'col' : ['A','B','A','B','A','B']})
df.sort_values(by='col',inplace=True)
df
Out[62]:
col name
0 A [one two, three four]
2 A []
4 A [one two]
1 B [one]
3 B []
5 B [three]
我想获得一个专栏,跟踪name
中每个col
组合中df
Out[62]:
col name unique_list
0 A [one two, three four] [one two, three four]
2 A [] [one two, three four]
4 A [one two] [one two, three four]
1 B [one] [one, three]
3 B [] [one, three]
5 B [three] [one, three]
中包含的所有唯一字符串。
即,预期输出为
[one two, three four]
事实上,对于A组,您可以看到[]
,[one two]
和[one two]
中包含的唯一字符串集是df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))
df
Out[65]:
col name count_unique
0 A [one two, three four] 2
2 A [] 2
4 A [one two] 2
1 B [one] 2
3 B [] 2
5 B [three] 2
我可以使用Pandas : how to get the unique number of values in cells when cells contain lists?获取相应数量的唯一值:
nunique
但将unique
替换为http-proxy-host = 192.168.1.21
http-proxy-port = 3690
http-proxy-username = [username]
http-proxy-password = [password]
失败。
有什么想法吗? 谢谢!
答案 0 :(得分:2)
尝试:
uniq_df = df.groupby('col')['name'].apply(lambda x: list(set(reduce(lambda y,z: y+z,x)))).reset_index()
uniq_df.columns = ['col','uniq_list']
pd.merge(df,uniq_df, on='col', how='left')
期望的输出:
col name uniq_list
0 A [one two, three four] [one two, three four]
1 A [] [one two, three four]
2 A [one two] [one two, three four]
3 B [one] [three, one]
4 B [] [three, one]
5 B [three] [three, one]
答案 1 :(得分:2)