我需要实现一个能够分析单词出现频率的功能。我尝试了下面的代码,但输出似乎过于分散和重复。有没有一种方法可以将这些数据分组/打包在一起,并且将出现的次数显示一次而不是多次?
file = "PartA"
f = open(file, 'r')
wordstring = f.read()
wordlist = wordstring.split()
wordfreq = []
for w in wordlist:
wordfreq.append(wordlist.count(w))
print("String\n" + wordstring +"\n")
print("list\n"+ str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")
我的输出:
String
hey there
This is Joey
how is it going
it it it it it it it it it it it it
is is is is is is
list
['hey', 'there', 'This', 'is', 'Joey', 'how', 'is', 'it', 'going', 'it', 'it', 'it', 'it', 'it', 'it', 'it', 'it', 'it',
'it', 'it', 'it', 'is', 'is', 'is', 'is', 'is', 'is']
Frequencies
[1, 1, 1, 8, 1, 1, 8, 13, 1, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 8, 8, 8, 8, 8, 8]
答案 0 :(得分:1)
Counter是您要寻找的。 p>
from collections import Counter
s = "hey there This is Joey how is it going it it it it it it it it it it it it is is is is is is"
counter=Counter(s.split())
Counter({'it':13,'is':8,'hey':1,'there':1,'This':1,'Joey':1,'how':1,'going ':1})
还请注意,对每个元素都使用count方法。这会导致O(n ^ 2)的复杂性,其中n是列表的长度,因为您多次考虑同一单词。通过仅对不同的单词使用count(),您可以在O(n * k)中做到这一点,其中k是不同元素的数量(最坏的情况下仍为O(n ^ 2))。
只需使用字典,您就可以在线性时间内解决您的问题。
答案 1 :(得分:0)
我同意最好的方法是在集合中使用实现,如果您需要自己实现(也许是家庭作业?),则可以使用字典
word_freq = {}
with open(file, 'r') as f
word_list = f.read().split()
for word in word_list:
word_freq.setdefault(word, 0)
word_freq[word] += 1
print(word_freq)
如果要将其放入函数中(如注释所示),则可以执行以下操作:
def word_count(filename, n_words=-1):
'''return list of tuples of most frequent n words in the given file'''
word_freq = {}
with open(file, 'r') as f
word_list = f.read().split()
for word in word_list:
word_freq.setdefault(word, 0)
word_freq[word] += 1
if n_words < 0: n_words = len(word_freq)
## generate pairs of words and frequencies sorted by frequency
pairs = zip(sorted(word_freq, key=word_freq.__getitem__),
sorted(word_freq.values()))
## return the first n of them as a list of tuples
return list(pairs)[:n_words]
我希望这是您想要的,但是请查看basics on dictionary sorting并阅读与您的问题相关的主题,以便您可以在最初的问题中更准确地表达特定问题。
答案 2 :(得分:0)
尝试在下面实现此代码,因此您无需再数相同的单词:
wordSet=set(wordList)
wordfreq=[]
for word in wordSet:
wordfreq.append(wordlist.count(w))