我制作了一个文本字符串并删除了所有非字母符号,并在单词之间添加了空格,但是当我将它们添加到字典中以计算单词的频率时,它会对字母进行计数。我如何计算字典中的单词?
dictionary = {}
for item in text_string:
if item in dictionary:
dictionary[item] = dictionary[item]+1
else:
dictionary[item] = 1
print(dictionary)
答案 0 :(得分:3)
更改此
for item in text_string:
到这个
for item in text_string.split():
Function .split()
将字符串拆分为使用空白字符(包括制表符和换行符)作为分隔符的单词。
答案 1 :(得分:1)
你非常接近。由于您声明您的单词已经以空格分隔,因此您需要使用str.split
来制作单词列表。
以下是一个例子:
dictionary = {}
text_string = 'there are repeated words in this sring with many words many are repeated'
for item in text_string.split():
if item in dictionary:
dictionary[item] = dictionary[item]+1
else:
dictionary[item] = 1
print(dictionary)
{'there': 1, 'are': 2, 'repeated': 2, 'words': 2, 'in': 1,
'this': 1, 'sring': 1, 'with': 1, 'many': 2}
另一种解决方案是使用标准库中提供的collections.Counter
:
from collections import Counter
text_string = 'there are repeated words in this sring with many words many are repeated'
c = Counter(text_string.split())
print(c)
Counter({'are': 2, 'repeated': 2, 'words': 2, 'many': 2, 'there': 1,
'in': 1, 'this': 1, 'sring': 1, 'with': 1})