Question

我想用一个方法实现一个python模块，首先加载一个大文件然后将过滤应用到参数中，如下所示：

def filter(word_list):
    filtered_words = []
    special_words =  [line.strip() for line in open('special_words.txt', 'r')]
    for w in word_list:
        if not w in special_words
            filtered_words.append(w)
    return filtered_words

问题是，我想只为洞执行加载一次这个文件，而不是每次调用这个方法。在Java中我可以为此目的使用静态块，但我在python中有哪些选项？

Answer 1

您可以将文件加载到模块全局范围的列表中;这个代码只会在第一次导入模块时运行一次。

Answer 2

对我而言，这听起来像是你想要的备忘函数，这样当你用已知的参数调用它时，它将返回已知的响应，而不是重做它...这个特定的实现来自http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

虽然这个问题可能有些过分，但memoize是一个非常有用的模式知道

import collections
import functools

class memoized(object):
   '''Decorator. Caches a function's return value each time it is called.
   If called later with the same arguments, the cached value is returned
   (not reevaluated).
   '''
   def __init__(self, func):
      self.func = func
      self.cache = {}
   def __call__(self, *args):
      if not isinstance(args, collections.Hashable):
         # uncacheable. a list, for instance.
         # better to not cache than blow up.
         return self.func(*args)
      if args in self.cache:
         return self.cache[args]
      else:
         value = self.func(*args)
         self.cache[args] = value
         return value
   def __repr__(self):
      '''Return the function's docstring.'''
      return self.func.__doc__
   def __get__(self, obj, objtype):
      '''Support instance methods.'''
      return functools.partial(self.__call__, obj)

@memoized
def get_words(fname):
   return list(open(fname, 'r')) 

@memoized
def filter(word_list):
    filtered_words = []
    special_words =  [line.strip() for line in get_words("special_words.txt")]
    for w in word_list:
        if not w in special_words
            filtered_words.append(w)
    return filtered_words

在旁注上一个巧妙的伎俩是

 print set(word_list).difference(special_words)

哪个应该快得多（假设你不关心丢失的副本）

Answer 3

您希望事先构造单词集，以便每次调用该函数时都不会读取该文件。此外，您可以使用列表解析简化过滤器功能：

with open('special_words.txt', 'r') as handle:
    special_words = {line.strip() for line in handle}

def filter(word_list):
    return [word for word in word_list if word not in special_words]

在模块中打开文件并将其保存在内存中

3 个答案: