如果之前已经解决过,请道歉。我无法找到解决我的具体问题的任何先前的答案,所以在这里。
练习要求用户输入.txt文件名。代码获取该文件,并计算其中的单词,创建单词词典:计数对。如果文件已被输入,并且其单词被计数,则程序将引用缓存,而不是重新计算它,其中存储了先前的计数。
我的问题是创建一个嵌套的字典词典 - 缓存。以下是我到目前为止的情况。目前,每个新的.txt文件都会重写字典,并阻止将其用作缓存。
def main():
file = input("Enter the file name: ") #Takes a file input to count the words
d = {} #open dictionary of dictionaries: a cache of word counts]
with open(file) as f:
if f in d: #check if this file is in cache.
for word in sorted(d[f]): #print the result of the word count of an old document.
print("That file has already been assessed:\n%-12s:%5d" % (word, d[f][word]))
else: #count the words in this file and add the count to the cache as a nested list.
d[f] = {} #create a nested dictionary within 'd'.
for line in f: #counts the unique words within the document.
words = line.split()
for word in words:
word = word.rstrip("!'?.,") #clean up punctuation here
word = word.upper() #all words to uppercase here
if word not in d[f]:
d[f][word] = 1
else:
d[f][word] = d[f][word] + 1
for word in sorted(d[f]): #print the result of the word count of a new document.
print("%-12s:%5d" % (word, d[f][word]))
main() #Run code again to try new file.
main()
答案 0 :(得分:1)
轻松修复:
d[file] = {}
....
d[file][word] = 1 # and so on
因为当您f
时,[f]仍然引用d
中的相同条目
此外,您可以重复使用defaultdict
:
from collections import defaultdict
d = defaultdict(lambda x: defaultdict(int))
def count(file):
with (open(file)) as f:
if file not in d:
# this is just list comprehension
[d[file][word.rstrip("!'?.,").upper()] += 1
for word in line.split()
for line in f]
return d[file]
def main():
file = input("Enter the file name: ")
count(file)
if file in d:
print("That file has already been assessed, blah blah")
for word in sorted(d[file]): #print the result of the word count of a new document.
print("%-12s:%5d" % (word, d[f][word]))
if __name__ == "__main__":
main()
答案 1 :(得分:1)
您的问题是,每次拨打collections.Counter()
时都会重新初始化字典。您需要在循环之外声明它,您要求用户提供文件名。
使用string.translate
和from collections import Counter
import string
import os.path
d = {}
while True:
input_file = input("Enter the file name: ")
if not os.path.isfile(input_file):
print('File not found, try again')
continue
if d.get(input_file, None):
print('Already found, top 5 words:')
else:
with open(input_file, 'rb') as f:
d[input_file] = Counter(f.read().upper().translate(None, string.punctuation).split())
for word, freq in sorted(d[input_file].items(), reverse=True, key=lambda x: x[1])[:5]:
print(word.ljust(20) + str(freq).rjust(5))
也可以加快整个过程:
THE 24
OF 15
AND 12
A 10
MODEL 9
这将打印文件中前5个最常用的单词及其频率。如果它已经看过该文件,它会提供相应的警告。示例输出:
{{1}}