我正在尝试获取 Pandas 数据框列中每个唯一字符串的列表:
import pandas as pd
catalog = {'code': ['A001', 'A001', 'A001', 'A002', 'A002'], 'title': ['director', 'president', 'vice president', 'sales director', 'sales vice president']}
catalog=pd.DataFrame(catalog)
## unique column values ##
codes = catalog['code'].unique()
for code in codes:
titles = catalog[catalog == code]['title'].tolist()
print(titles)
给出下一个输出:
[nan, nan, nan, nan, nan]
[nan, nan, nan, nan, nan]
预期输出可能如下所示:
['director', 'president', 'vice president']
['sales director', 'sales vice president']
我错过了什么? 有没有其他方法可以完成这个任务?
答案 0 :(得分:4)
试试
catalog.groupby('code')['title'].unique()
code
A001 [director, president, vice president]
A002 [sales director, sales vice president]
Name: title, dtype: object
答案 1 :(得分:3)
与遍历唯一代码不同,使用 groupby 更容易:
catalog.groupby("code").title.apply(list)
code
A001 [director, president, vice president]
A002 [sales director, sales vice president]
Name: title, dtype: object
答案 2 :(得分:3)
您的代码存在一个问题,即您在分配 title
变量时比较完整数据框,而不是与列进行比较:
for code in codes:
titles = catalog[catalog['code'] == code]['title'].tolist()
print(titles)
或者:
for code in codes:
titles = catalog.loc[catalog['code'] == code,'title'].tolist()
print(titles)
['director', 'president', 'vice president']
['sales director', 'sales vice president']