Question

我有一个pandas DataFrame，其中包含“ review”列中的单词列表。我需要查找复查栏中出现的单词的频率。

invalidation

我尝试使用计数器功能，但是它显示“无法哈希的列表”为错误。这该怎么做？？

Answer 1

如果我对您的理解正确，那么您希望在“评论”列中包含一组所有单词，并获取此列中所有单元格的字数统计。

那么解决方案就是一行：

import pandas
from collections import Counter
import itertools 

df = pandas.DataFrame({'id': ['5814_8', '2381_9', '7759_3', '3630_4', '9495_8', '8196_8'], 'review':
    [['stuff', 'going', 'moment', 'mj', 've', 'started'],
    ['the', 'classic', 'war', 'worlds', '', 'timothy'],
    ['film', 'starts', 'manager', 'nicholas', 'bell'],
    ['must', 'assumed', 'praised', 'film', 'the'],
    ['superbly', 'trashy', 'wondrously', 'unpretentious'],
    ['dont', 'know', 'people', 'think', 'bad', 'movie', 'got']]})

Counter(itertools.chain(*df['review'].tolist()))

结果： Counter（{''：1， “假设”：1 '坏'：1， '钟'：1， '经典'：1， '不要'：1，电影：2 '去'：1， '得到'：1， '知道'：1， '经理'：1， 'mj'：1， '时刻'：1， '电影'：1， '必须'：1， 'nicholas'：1， '人民'：1， “赞扬”：1 “开始”：1 '开始'：1， '东西'：1， '极好'：1， 'the'：2， '思考'：1 “提摩太”：1 “混乱”：1 “谦虚”：1， 've'：1 '战争'：1， “棒极了”：1 'worlds'：1}）

Answer 2

您可以在计数器内使用列表理解：

Counter([i for s in df.review for i in s])

DataFrame中单词的频率

2 个答案: