如何计算单词出现在字符串列表中的次数?
例如:
['This is a sentence', 'This is another sentence']
,单词“sentence”的结果为2
答案 0 :(得分:9)
使用collections.Counter()
对象并在空白处分割单词。你可能也希望小写你的单词,并删除标点符号:
from collections import Counter
counts = Counter()
for sentence in sequence_of_sentences:
counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())
或者可能使用仅匹配单词字符的正则表达式:
from collections import Counter
import re
counts = Counter()
words = re.compile(r'\w+')
for sentence in sequence_of_sentences:
counts.update(words.findall(sentence.lower()))
现在你有一个counts
字典,包含每个字数。
演示:
>>> sequence_of_sentences = ['This is a sentence', 'This is another sentence']
>>> from collections import Counter
>>> counts = Counter()
>>> for sentence in sequence_of_sentences:
... counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())
...
>>> counts
Counter({'this': 2, 'is': 2, 'sentence': 2, 'a': 1, 'another': 1})
>>> counts['sentence']
2
答案 1 :(得分:3)
您可以使用一些正则表达式和字典轻松地完成您想要的任务。
import re
dict = {}
sentence_list = ['This is a sentence', 'This is a sentence']
for sentence in sentence_list:
for word in re.split('\s', sentence): # split with whitespace
try:
dict[word] += 1
except KeyError:
dict[word] = 1
print dict