从数据框列列表创建术语频率字典

时间:2017-02-14 14:15:34

标签: python list pandas dictionary collections

我有一个数据框,其中包含一个字符串列表作为列,并希望使用collections.counter创建一个术语频率字典。数据框如下所示:

>>> job_title['title']
0         [responsible, caring, trustworthy, babysitter]
1               [compassionate, trustworthy, babysitter]
2      [family, looking, kindergarten, preschool, chi...
3      [babysitter, needed, 2, children, bee, cave, n...
4             [fun, patient, nonjudgemental, babysitter]
5      [responsible, interactive, intelligent, babysi...
6                    [responsible, friendly, babysitter]
7      [family, looking, kindergarten, preschool, chi...
8      [family, looking, kindergarten, preschool, chi...
9                     [reliable, clean, friendly, nanny]

实现这一目标的最有效方法是什么?

1 个答案:

答案 0 :(得分:1)

我认为您可以lists展开chain.from_iterable,然后使用Counter

from  itertools import chain
from collections import Counter

print (Counter(chain.from_iterable(job_title.title)))

样品:

job_title = pd.DataFrame({'title':[['responsible', 'caring', 'trustworthy', 'babysitter'],
                                   ['compassionate', 'trustworthy', 'babysitter']]})

print (job_title)
                                            title
0  [responsible, caring, trustworthy, babysitter]
1        [compassionate, trustworthy, babysitter]


print (Counter(chain.from_iterable(job_title.title)))
Counter({'babysitter': 2, 'trustworthy': 2, 
         'compassionate': 1, 'responsible': 1, 'caring': 1})