我必须创建一个程序来计算文本文件中的单词数。
所以,我的计划:
-user输入txt文件的名称,
-app将其加载到变量'text',
- 将其设为小写,
- 只搜索没有'/''#'等字符的单词,没有空格等只有字母字符串
- 将其制作成单词列表,
- 显示所有单词,1st应该有最大的用途,最后应该至少使用1次
如何更改它以包含最小长度+3的单词?示例:in,on,at< - 不应包含列表,单词,显示,清除< - 应包括在内。
from collections import Counter
import re
def open_file():
file_name = input("Enter a filename: ") # enter name of file which should be open
with open(file_name) as f: # it should exist in project folder
text = f.read() # load file into var text
f.close() # close the file
return text
try:
text = open_file() # open file and write it into var
except FileNotFoundError:
print("File was not found!")
text = "" # if FileNotFoundError = True -> text = none
lower_text = text.lower() # transform txt into lower cases
text_with_out_special_signs = re.findall(r'[a-z]*', lower_text) #delete signs like =,#,!
counts_of_words = Counter(text_with_out_special_signs) # transform list in Counter
for x in counts_of_words.most_common(): # show results
print(x)
答案 0 :(得分:1)
如果你想删除少于3个字符的单词,你可以这样做:
text_more_than_3_char_words = [w for w in text_with_out_special_signs if len(w) > 2]
counts_of_words = Counter(text_more_than_3_char_words) # transform list in Counter