Python:已保存的pickle Counter有数据,但无法使用函数加载文件

时间:2014-12-20 18:15:26

标签: python dictionary pickle

我正在尝试建立一个外语频率词典/词汇学习者。

我希望程序能够:

  1. 处理书籍/文本文件,将文本分解为单独的单词并按频率排序(我使用Counter()执行此操作)
  2. Counter()保存到pickle文件中,这样我每次运行程序时都不必处理该书
  3. 访问pickle文件并拔出第N个最常用的单词(使用most_common()功能轻松完成)
  4. 问题是,一旦我处理了一本书并将其保存到pickle文件,我就无法再次访问它了。执行此操作的函数会加载一个空字典,即使在检查pickle文件时,我也可以看到它确实有数据。

    此外,如果我手动加载pickle文件(使用pickle.load())并手动拉出第N个最常用的字(手动使用most_common()而不是加载pickle的自定义函数并拉出第N个最常见的词)它会完美运作。

    我怀疑加载pickle文件的自定义函数有问题,但我无法弄清楚它是什么。

    以下是代码:

    import string
    import collections
    import pickle
    
    freq_dict = collections.Counter()
    dfn_dict = dict()
    
    def save_dict(name, filename):
        pickle.dump(name, open('{0}.p'.format(filename), 'wb'))
    
    #Might be a problem with this
    def load_dict(name, filename):
        name = pickle.load(open('{0}.p'.format(filename), 'rb'))
    
    def cleanedup(fh):
        for line in fh:
            word = ''
            for character in line:
                if character in string.ascii_letters:
                    word += character
                else:
                    yield word
                    word = ''
    
    #Opens a foreign language textfile and adds all unique
    #words in it, to a Counter, ordered by frequency
    def process_book(textname):
        with open (textname) as doc:
            freq_dict.update(cleanedup(doc))
        save_dict(freq_dict, 'svd_f_dict')
    
    #Shows the Nth most frequent word in the frequency dict
    def show_Nth_word(N):
        load_dict(freq_dict, 'svd_f_dict')
        return freq_dict.most_common()[N]
    
    #Shows the first N most frequent words in the freq. dictionary    
    def show_N_freq_words(N):    
        load_dict(freq_dict, 'svd_f_dict')
        return freq_dict.most_common(N)
    
    #Presents a word to the user, allows user to define it
    #adds the word and its definition to another dictionary
    #which is used to store only the word and its definition
    def define_word(word):
        load_dict(freq_dict, 'svd_f_dict')
        load_dict(dfn_dict, 'svd_d_dict')
        if word in freq_dict:
            definition = (input('Please define ' + str(word) + ':'))
            dfn_dict[word] = definition
        else:
            return print('Word not in dictionary!')
        save_dict(dfn_dict, 'svd_d_dict')
    

    这是尝试使用两种方法(手动和函数)拉出第N个常用词:

    from dictionary import *
    import pickle
    
    #Manual, works
    freq_dict = pickle.load(open('svd_f_dict.p', 'rb'))
    print(freq_dict.most_common()[2])
    
    #Using a function defined in the other file, doesn't work
    word = show_Nth_word(2)
    

    感谢您的帮助!

1 个答案:

答案 0 :(得分:3)

您的load_dict函数将unpickling的结果存储到本地变量“name'”中。这不会修改您作为参数传递给函数的对象。

相反,您需要从load_dict()函数返回调用pickle.load()的结果:

def load_dict(filename):
    return pickle.load(open('{0}.p'.format(filename), 'rb'))

然后将其分配给您的变量:

freq_dict = load_dict('svd_f_dict')