我想用一个方法实现一个python模块,首先加载一个大文件然后将过滤应用到参数中,如下所示:
def filter(word_list):
filtered_words = []
special_words = [line.strip() for line in open('special_words.txt', 'r')]
for w in word_list:
if not w in special_words
filtered_words.append(w)
return filtered_words
问题是,我想只为洞执行加载一次这个文件,而不是每次调用这个方法。在Java中我可以为此目的使用静态块,但我在python中有哪些选项?
答案 0 :(得分:4)
您可以将文件加载到模块全局范围的列表中;这个代码只会在第一次导入模块时运行一次。
答案 1 :(得分:2)
对我而言,这听起来像是你想要的备忘函数,这样当你用已知的参数调用它时,它将返回已知的响应,而不是重做它...这个特定的实现来自http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
虽然这个问题可能有些过分,但memoize是一个非常有用的模式知道
import collections
import functools
class memoized(object):
'''Decorator. Caches a function's return value each time it is called.
If called later with the same arguments, the cached value is returned
(not reevaluated).
'''
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if not isinstance(args, collections.Hashable):
# uncacheable. a list, for instance.
# better to not cache than blow up.
return self.func(*args)
if args in self.cache:
return self.cache[args]
else:
value = self.func(*args)
self.cache[args] = value
return value
def __repr__(self):
'''Return the function's docstring.'''
return self.func.__doc__
def __get__(self, obj, objtype):
'''Support instance methods.'''
return functools.partial(self.__call__, obj)
@memoized
def get_words(fname):
return list(open(fname, 'r'))
@memoized
def filter(word_list):
filtered_words = []
special_words = [line.strip() for line in get_words("special_words.txt")]
for w in word_list:
if not w in special_words
filtered_words.append(w)
return filtered_words
在旁注上一个巧妙的伎俩是
print set(word_list).difference(special_words)
哪个应该快得多(假设你不关心丢失的副本)
答案 2 :(得分:1)
您希望事先构造单词集,以便每次调用该函数时都不会读取该文件。此外,您可以使用列表解析简化过滤器功能:
with open('special_words.txt', 'r') as handle:
special_words = {line.strip() for line in handle}
def filter(word_list):
return [word for word in word_list if word not in special_words]