在模块中打开文件并将其保存在内存中

时间:2013-06-18 15:44:32

标签: python

我想用一个方法实现一个python模块,首先加载一个大文件然后将过滤应用到参数中,如下所示:

def filter(word_list):
    filtered_words = []
    special_words =  [line.strip() for line in open('special_words.txt', 'r')]
    for w in word_list:
        if not w in special_words
            filtered_words.append(w)
    return filtered_words

问题是,我想只为洞执行加载一次这个文件,而不是每次调用这个方法。在Java中我可以为此目的使用静态块,但我在python中有哪些选项?

3 个答案:

答案 0 :(得分:4)

您可以将文件加载到模块全局范围的列表中;这个代码只会在第一次导入模块时运行一次。

答案 1 :(得分:2)

对我而言,这听起来像是你想要的备忘函数,这样当你用已知的参数调用它时,它将返回已知的响应,而不是重做它...这个特定的实现来自http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

虽然这个问题可能有些过分,但memoize是一个非常有用的模式知道

import collections
import functools

class memoized(object):
   '''Decorator. Caches a function's return value each time it is called.
   If called later with the same arguments, the cached value is returned
   (not reevaluated).
   '''
   def __init__(self, func):
      self.func = func
      self.cache = {}
   def __call__(self, *args):
      if not isinstance(args, collections.Hashable):
         # uncacheable. a list, for instance.
         # better to not cache than blow up.
         return self.func(*args)
      if args in self.cache:
         return self.cache[args]
      else:
         value = self.func(*args)
         self.cache[args] = value
         return value
   def __repr__(self):
      '''Return the function's docstring.'''
      return self.func.__doc__
   def __get__(self, obj, objtype):
      '''Support instance methods.'''
      return functools.partial(self.__call__, obj)

@memoized
def get_words(fname):
   return list(open(fname, 'r')) 

@memoized
def filter(word_list):
    filtered_words = []
    special_words =  [line.strip() for line in get_words("special_words.txt")]
    for w in word_list:
        if not w in special_words
            filtered_words.append(w)
    return filtered_words

在旁注上一个巧妙的伎俩是

 print set(word_list).difference(special_words) 

哪个应该快得多(假设你不关心丢失的副本)

答案 2 :(得分:1)

您希望事先构造单词集,以便每次调用该函数时都不会读取该文件。此外,您可以使用列表解析简化过滤器功能:

with open('special_words.txt', 'r') as handle:
    special_words = {line.strip() for line in handle}

def filter(word_list):
    return [word for word in word_list if word not in special_words]