在Python中计算文本文件中单词的频率

时间:2016-12-07 07:19:14

标签: python file text counter frequency

我试图找出如何创建一个程序来获取用户选择的文件(通过输入文件名)并计算用户输入的每个单词的频率。

我有大部分内容,但是当我输入多个单词供程序查找时,只有第一个单词显示正确的频率,其余的显示为" 0次出现"

file_name = input("What file would you like to open? ")
f = open(file_name, "r")
the_full_text = f.read()
words = the_full_text.split()
search_word = input("What words do you want to find? ").split(",")
len_list = len(search_word) 

word_number = 0
print()
print ('... analyzing ... hold on ...')
print()
print ('Frequency of word usage within', file_name+":")
for i in range(len_list):

    frequency = 0
    for word in words:
        word = word.strip(",.")
        if search_word[word_number].lower() == word.lower():
            frequency += 1
    print ("   ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences")
    word_number = word_number + 1

就像示例输出一样:

What file would you like to open? assignment_8.txt
What words do you want to find? wey, rights, dem

... analyzing ... hold on ...

Frequency of word usage within assignment_8.txt:
    wey                  / 96 occurrences
    rights               / 0 occurrences
    dem                  / 0 occurrences

我的计划有什么问题?请帮忙:o

2 个答案:

答案 0 :(得分:1)

您需要从搜索字词中删除空格。

但是,您当前的算法效率非常低,因为它必须重新扫描每个搜索词的整个文本。这是一种更有效的方式。首先,我们清理搜索词并将它们放入列表中。然后我们从该列表中构建一个字典,以便在我们在文本文件中找到它们时存储每个单词的计数。

file_name = input("What file would you like to open? ")
with open(file_name, "r") as f:
    words = f.read().split()

search_words = input("What words do you want to find? ").split(',')
search_words = [word.strip().lower() for word in search_words]
#print(search_words)
search_counts = dict.fromkeys(search_words, 0)

print ('\n... analyzing ... hold on ...')
for word in words:
    word = word.rstrip(",.").lower()
    if word in search_counts:
        search_counts[word] += 1

print ('\nFrequency of word usage within', file_name + ":")
for word in search_words:
    print("   {:<20s} / {} occurrences".format(word, search_counts[word]))

答案 1 :(得分:1)

有很多方法可以做到这一点,下面是一个读取 .txt 文件并使用 Wordlist 和 Word Frequency 创建字典的程序,我们还可以拆分和识别句子。

"""
Created on Fri Jun 11 17:06:52 2021

@author: Vijayendra Dwari
"""

sentences = []
wordlist = []

digits = "1,2,3,4,5,6,7,8,9,0"
punc = "!,@,$,%+,^,&,*,(),>,‚·<,},{,[],#,_ï,-,/,',’"
drop =    "a,is,are,when,then,an,the,we,us,upto,,them,their,from,for,in,of,at,to,out,in,and,into,any,but,also,too,that"
import os

FileName = input("Please enter the file name: ")
f = open('FileName',"r")
for line in f:    
line = " ".join(line.split())
line = "".join([c for c in line if c not in digits])   
line = "".join([c for c in line if c not in punc])
line = "".join(line.split('  '))

temp = line.split('.')
temp2 = line.split(' ')
sentences.append(temp)
wordlist.append(temp2)
word_dict = {'wordlist':'word_freq'}
wordcount=0
for i in range(0,len(sentences)):
    for word in wordlist[i]:
        if word not in drop:                        
            word_dict[word] = word_dict.get(word, 0) + 1
            wordcount += 1
        i=i+1
        word_freq = []    
for key, value in word_dict.items():
    word_freq.append((value, key))
   
f.close()
print(word_freq)
print(wordlist)
print(sentences)

enter image description here