我有一个数据框,其中包含一个字符串列表作为列,并希望使用collections.counter创建一个术语频率字典。数据框如下所示:
>>> job_title['title']
0 [responsible, caring, trustworthy, babysitter]
1 [compassionate, trustworthy, babysitter]
2 [family, looking, kindergarten, preschool, chi...
3 [babysitter, needed, 2, children, bee, cave, n...
4 [fun, patient, nonjudgemental, babysitter]
5 [responsible, interactive, intelligent, babysi...
6 [responsible, friendly, babysitter]
7 [family, looking, kindergarten, preschool, chi...
8 [family, looking, kindergarten, preschool, chi...
9 [reliable, clean, friendly, nanny]
实现这一目标的最有效方法是什么?
答案 0 :(得分:1)
我认为您可以lists
展开chain.from_iterable
,然后使用Counter
:
from itertools import chain
from collections import Counter
print (Counter(chain.from_iterable(job_title.title)))
样品:
job_title = pd.DataFrame({'title':[['responsible', 'caring', 'trustworthy', 'babysitter'],
['compassionate', 'trustworthy', 'babysitter']]})
print (job_title)
title
0 [responsible, caring, trustworthy, babysitter]
1 [compassionate, trustworthy, babysitter]
print (Counter(chain.from_iterable(job_title.title)))
Counter({'babysitter': 2, 'trustworthy': 2,
'compassionate': 1, 'responsible': 1, 'caring': 1})