如何获取唯一的熊猫数据框列元素的列表?

时间:2021-04-22 14:54:17

标签: python pandas dataframe

我正在尝试获取 Pandas 数据框列中每个唯一字符串的列表:

import pandas as pd

catalog = {'code': ['A001', 'A001', 'A001', 'A002', 'A002'], 'title': ['director', 'president', 'vice president', 'sales director', 'sales vice president']}

catalog=pd.DataFrame(catalog)

## unique column values ##
codes = catalog['code'].unique()

for code in codes:
     titles = catalog[catalog == code]['title'].tolist()
     print(titles)

给出下一个输出:

[nan, nan, nan, nan, nan]
[nan, nan, nan, nan, nan]

预期输出可能如下所示:

['director', 'president', 'vice president']
['sales director', 'sales vice president']

我错过了什么? 有没有其他方法可以完成这个任务?

3 个答案:

答案 0 :(得分:4)

试试

catalog.groupby('code')['title'].unique()
code
A001     [director, president, vice president]
A002    [sales director, sales vice president]
Name: title, dtype: object

答案 1 :(得分:3)

与遍历唯一代码不同,使用 groupby 更容易:

catalog.groupby("code").title.apply(list)

code
A001    [director, president, vice president]
A002    [sales director, sales vice president]
Name: title, dtype: object

答案 2 :(得分:3)

您的代码存在一个问题,即您在分配 title 变量时比较完整数据框,而不是与列进行比较:

for code in codes:
    titles = catalog[catalog['code'] == code]['title'].tolist()
    print(titles)

或者:

for code in codes:
    titles = catalog.loc[catalog['code'] == code,'title'].tolist()
    print(titles)

['director', 'president', 'vice president']
['sales director', 'sales vice president']