Question

我需要实现一个能够分析单词出现频率的功能。我尝试了下面的代码，但输出似乎过于分散和重复。有没有一种方法可以将这些数据分组/打包在一起，并且将出现的次数显示一次而不是多次？

file = "PartA"
f = open(file, 'r')
wordstring = f.read()

wordlist = wordstring.split()

wordfreq = []

for w in wordlist:
    wordfreq.append(wordlist.count(w))

print("String\n" + wordstring +"\n")
print("list\n"+ str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")

我的输出：

String
hey there
This is Joey
how is it going
it it it it it it it it it it it it
is is is is is is


list
['hey', 'there', 'This', 'is', 'Joey', 'how', 'is', 'it', 'going', 'it', 'it', 'it', 'it', 'it', 'it', 'it', 'it', 'it',
 'it', 'it', 'it', 'is', 'is', 'is', 'is', 'is', 'is']

Frequencies
[1, 1, 1, 8, 1, 1, 8, 13, 1, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 8, 8, 8, 8, 8, 8]

Answer 1

Counter是您要寻找的。

from collections import Counter

s  = "hey there This is Joey how is it going it it it it it it it it it it it it is is is is is is"
counter=Counter(s.split())

Counter（{'it'：13，'is'：8，'hey'：1，'there'：1，'This'：1，'Joey'：1，'how'：1，'going '：1}）

还请注意，对每个元素都使用count方法。这会导致O（n ^ 2）的复杂性，其中n是列表的长度，因为您多次考虑同一单词。通过仅对不同的单词使用count（），您可以在O（n * k）中做到这一点，其中k是不同元素的数量（最坏的情况下仍为O（n ^ 2））。

只需使用字典，您就可以在线性时间内解决您的问题。

Answer 2

我同意最好的方法是在集合中使用实现，如果您需要自己实现（也许是家庭作业？），则可以使用字典

word_freq = {}
with open(file, 'r') as f
   word_list = f.read().split()
   for word in word_list:
      word_freq.setdefault(word, 0)
      word_freq[word] += 1

print(word_freq)

如果要将其放入函数中（如注释所示），则可以执行以下操作：

def word_count(filename, n_words=-1):
'''return list of tuples of most frequent n words in the given file'''
   word_freq = {}
      with open(file, 'r') as f
         word_list = f.read().split()
         for word in word_list:
            word_freq.setdefault(word, 0)
            word_freq[word] += 1
         if n_words < 0: n_words = len(word_freq)
## generate pairs of words and frequencies sorted by frequency
   pairs = zip(sorted(word_freq, key=word_freq.__getitem__), 
               sorted(word_freq.values()))
## return the first n of them as a list of tuples
   return list(pairs)[:n_words]

我希望这是您想要的，但是请查看basics on dictionary sorting并阅读与您的问题相关的主题，以便您可以在最初的问题中更准确地表达特定问题。

Answer 3

尝试在下面实现此代码，因此您无需再数相同的单词：

wordSet=set(wordList)
wordfreq=[]
for word in wordSet:
    wordfreq.append(wordlist.count(w))

分析单词出现频率

3 个答案: