网上有很多资源显示如何为单个单词进行单词计数 例如this和this以及this等等...... 但我无法找到两个单词计数频率的具体例子。
我有一个csv文件,里面有一些字符串。
FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
所以我希望输出如下:
wordscount = {"I love": 2, "show makes": 2, "makes me" : 2 }
当然,我必须删除所有逗号,审讯点.... {!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }
我还会删除一些我发现here的停用词,只是为了从文本中获取更具体的数据。
如何使用python实现此结果?
谢谢!
答案 0 :(得分:2)
>>> from collections import Counter
>>> import re
>>>
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}