Question

我试图计算一个单词的使用频率。如果我说“嗨，我是尼克”，那么每个单词都会给我一个数。我遵循了这本书，但是当我做类似“我风筝很高”的事情时，我得到3分。有没有办法仅靠自己计算i和a？

txt = "i am high as a kite"

x = txt.split(" ")

for num_of_instances in x:
    count = txt.count(num_of_instances)
    print(num_of_instances, count)

Answer 1

只需：

x.count(num_of_instances)

代替：

txt.count(num_of_instances)

仍然，这将重复计算"to be or not to be"这样的句子中的重复单词（be和to将被计数两次），最好使用一组删除这些重复的单词（但是您会输掉单词出现的顺序）：

txt = "to be or not to be"

x = txt.split(" ")

for num_of_instances in set(x):
    count = x.count(num_of_instances)
    print(num_of_instances, count)

输出（每次执行代码时顺序可能会改变）：

be 2
to 2
not 1
or 1

最好使用Counter对象：

from collections import Counter
txt = "to be or not to be"
x = Counter(txt.split(" "))

for word, count in x.items():
    print(word, count)

输出：

to 2
be 2
or 1
not 1

Answer 2

我可以建议使用Python标准库随附的collections模块吗？

>>> import collections
>>> text = 'i am high as a kite'
>>> word_count = collections.Counter(text.split())
>>> word_count
Counter({'i': 1, 'am': 1, 'high': 1, 'as': 1, 'a': 1, 'kite': 1})
>>> character_count = collections.Counter(text)
>>> character_count
Counter({' ': 5, 'i': 3, 'a': 3, 'h': 2, 'm': 1, 'g': 1, 's': 1, 'k': 1, 't': 1, 'e': 1})
>>>

其中有一个名为Counter的类，该类完全是为了为您计数而建立的。它的界面有点类似于该语言的内置dict类型。您可以使用this link找到其文档。

试图找到单词的频率。有什么方法可以将字母算作自己的单词吗？

2 个答案: