使用python基于两个单词的词频计数

时间:2013-09-23 06:21:49

标签: csv python-2.7 count frequency-analysis word-frequency

网上有很多资源显示如何为单个单词进行单词计数 例如thisthis以及this等等...... 但我无法找到两个单词计数频率的具体例子。

我有一个csv文件,里面有一些字符串。

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

所以我希望输出如下:

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

当然,我必须删除所有逗号,审讯点.... {!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

我还会删除一些我发现here的停用词,只是为了从文本中获取更具体的数据。

如何使用python实现此结果?

谢谢!

1 个答案:

答案 0 :(得分:2)

>>> from collections import Counter
>>> import re
>>> 
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}