字数缓存的嵌套字典

时间:2017-02-02 15:52:45

标签: python dictionary caching

如果之前已经解决过,请道歉。我无法找到解决我的具体问题的任何先前的答案,所以在这里。

练习要求用户输入.txt文件名。代码获取该文件,并计算其中的单词,创建单词词典:计数对。如果文件已被输入,并且其单词被计数,则程序将引用缓存,而不是重新计算它,其中存储了先前的计数。

我的问题是创建一个嵌套的字典词典 - 缓存。以下是我到目前为止的情况。目前,每个新的.txt文件都会重写字典,并阻止将其用作缓存。

def main():

file = input("Enter the file name: ")       #Takes a file input to count the words

d = {}    #open dictionary of dictionaries: a cache of word counts]

with open(file) as f:

    if f in d:      #check if this file is in cache.

        for word in sorted(d[f]):       #print the result of the word count of an old document.
            print("That file has already been assessed:\n%-12s:%5d" % (word, d[f][word]))

    else:       #count the words in this file and add the count to the cache as a nested list.

        d[f] = {}       #create a nested dictionary within 'd'.   

        for line in f:              #counts the unique words within the document.
            words = line.split()

            for word in words:
                word = word.rstrip("!'?.,")     #clean up punctuation here
                word = word.upper()             #all words to uppercase here

                if word not in d[f]:
                    d[f][word] = 1
                else:
                    d[f][word] = d[f][word] + 1

    for word in sorted(d[f]):       #print the result of the word count of a new document.
        print("%-12s:%5d" % (word, d[f][word]))


    main()      #Run code again to try new file.

main() 

2 个答案:

答案 0 :(得分:1)

轻松修复:

d[file] = {}
....
d[file][word] = 1  # and so on

因为当您f时,[f]仍然引用d中的相同条目

此外,您可以重复使用defaultdict

from collections import defaultdict

d = defaultdict(lambda x: defaultdict(int))

def count(file):
    with (open(file)) as f:
        if file not in d:
            # this is just list comprehension
            [d[file][word.rstrip("!'?.,").upper()] += 1 
                 for word in line.split()
                     for line  in f]
    return d[file]

def main():
    file = input("Enter the file name: ")
    count(file)
    if file in d:
        print("That file has already been assessed, blah blah")
    for word in sorted(d[file]):       #print the result of the word count of a new document.
        print("%-12s:%5d" % (word, d[f][word]))

if __name__ == "__main__":
    main()        

答案 1 :(得分:1)

您的问题是,每次拨打collections.Counter()时都会重新初始化字典。您需要在循环之外声明它,您要求用户提供文件名。

使用string.translatefrom collections import Counter import string import os.path d = {} while True: input_file = input("Enter the file name: ") if not os.path.isfile(input_file): print('File not found, try again') continue if d.get(input_file, None): print('Already found, top 5 words:') else: with open(input_file, 'rb') as f: d[input_file] = Counter(f.read().upper().translate(None, string.punctuation).split()) for word, freq in sorted(d[input_file].items(), reverse=True, key=lambda x: x[1])[:5]: print(word.ljust(20) + str(freq).rjust(5)) 也可以加快整个过程:

THE                    24
OF                     15
AND                    12
A                      10
MODEL                   9

这将打印文件中前5个最常用的单词及其频率。如果它已经看过该文件,它会提供相应的警告。示例输出:

{{1}}