如何从熊猫列中的列表中提取元素并将其附加到集合中

时间:2019-11-23 13:08:20

标签: python pandas

我有一个熊猫DataFrame,其列名为df[categories],如下所示:

0         ['ACCESSORIES', 'AUDIO', 'LOUNGE']
1         ['ACCESSORIES', 'MAJOR APPLIANCES', 'VISUAL']
2         ['BEDROOM SUITES', 'COMPUTERS', 'COMPUTERS', 'HOME OFFICE', 'HOME OFFICE', 'MAJOR APPLIANCES', 'VISUAL']
3         ['BEDDING', 'MAJOR APPLIANCES', 'MAJOR APPLIANCES', 'SMALL APPLIANCES', 'SMALL APPLIANCES']
4         [PATIO]
5         ['MAJOR APPLIANCES', 'SMALL APPLIANCES']
6         ['ACCESSORIES', 'MAJOR APPLIANCES', 'MAJOR APPLIANCES', 'SMALL APPLIANCES', 'SMALL APPLIANCES', 'VISUAL', 'VISUAL']

我需要遍历37000行的整个列,并将每个项目附加到集合中,因为我不想重复值。我尝试过:

categories = set()
categories = df['category'].apply(lambda a: set(a))

这将带回一个看起来像这样的熊猫系列:

0       {AUDIO, LOUNGE, ACCESSORIES}
1       {MAJOR APPLIANCES, ACCESSORIES, VISUAL}
2       {'BEDROOM SUITES', 'COMPUTERS', 'HOME OFFICE', 'MAJOR APPLIANCES', 'VISUAL'}
3       {'BEDDING', 'MAJOR APPLIANCES', 'SMALL APPLIANCES'}
4       {PATIO}
5       {'MAJOR APPLIANCES', 'SMALL APPLIANCES'}
6       {'ACCESSORIES', 'MAJOR APPLIANCES', 'SMALL APPLIANCES', 'VISUAL'}

如上所述,我实际上需要的是一个仅包含像这样的唯一值的列表:

[AUDIO, ACCESSORIES, BEDROOM, COMPUTERS,LOUNGE, MAJOR APPLIANCES, ... , VISUAL]

2 个答案:

答案 0 :(得分:5)

如何?

set(df['category'].sum())

或者这个:

result = set()
df['category'].apply(result.update)

# Now the result is what you want

答案 1 :(得分:1)

您可以尝试以下方法:

import pandas as pd
categories = [['ACCESSORIES', 'AUDIO', 'LOUNGE'], ['ACCESSORIES', 'MAJOR APPLIANCES', 'VISUAL'], ['BEDROOM SUITES', 'COMPUTERS', 'COMPUTERS', 'HOME OFFICE', 'HOME OFFICE', 'MAJOR APPLIANCES', 'VISUAL'], ['BEDDING', 'MAJOR APPLIANCES', 'MAJOR APPLIANCES', 'SMALL APPLIANCES', 'SMALL APPLIANCES'], ['PATIO'], ['MAJOR APPLIANCES', 'SMALL APPLIANCES'], ['ACCESSORIES', 'MAJOR APPLIANCES', 'MAJOR APPLIANCES', 'SMALL APPLIANCES', 'SMALL APPLIANCES', 'VISUAL', 'VISUAL']]
df = pd.DataFrame({'category': categories})

print('pandas', pd.__version__)

sorted(set(df.category.explode()))

结果:

pandas 0.25.3

['ACCESSORIES',
 'AUDIO',
 'BEDDING',
 'BEDROOM SUITES',
 'COMPUTERS',
 'HOME OFFICE',
 'LOUNGE',
 'MAJOR APPLIANCES',
 'PATIO',
 'SMALL APPLIANCES',
 'VISUAL']