我正在尝试加速我的应用程序,我发现下面的简单小函数(compute_ave_freq)实际上是最大的时间之一。罪魁祸首似乎是当它破坏了一个NLTK FreqDist;这需要花费很多时间。
当然,即使是那个淫秽的时间也不到重新计算FreqDist的一半。有没有更好的方法来保存NLTK FreqDist对象?我尝试将其序列化为JSON,但这将其保存为一个简单的字典,失去了我需要的许多NLTK功能。
以下是代码:
def compute_ave_freq(word_forms):
fd = pickle.load(open("data/fd.txt", 'rb'))
total_freq = 0
for form in word_forms:
freq = fd.freq(form)
total_freq += freq
try:
ave_freq = total_freq/len(word_forms)
except ZeroDivisionError:
ave_freq = 0
return ave_freq
这是LineProfiler输出:
Total time: 0.197121 s
File: /home/username/development/appname/filename.py
Function: compute_ave_freq at line 25
Line # Hits Time Per Hit % Time Line Contents
==============================================================
25 def compute_ave_freq(word_forms, debug=False):
26 # word_forms is a list of morphological variations of a word, such as
27 # ['كتبوا', 'كتبو', 'كتبنا', 'كتبت']
28
29 1 78580 78580.0 79.1 fd = pickle.load(open("data/fd.txt", 'rb'))
30 1 3 3.0 0.0 total_freq = 0
31 5 10 2.0 0.0 for form in word_forms:
32 4 20676 5169.0 20.8 freq = fd.freq(form)
33 4 9 2.2 0.0 if debug==True:
34 print(form, '\n', freq)
35 4 6 1.5 0.0 total_freq += freq
36 1 1 1.0 0.0 try:
37 1 3 3.0 0.0 ave_freq = total_freq/len(word_forms)
38 except ZeroDivisionError:
39 ave_freq = 0
40 1 1 1.0 0.0 return ave_freq
谢谢!
答案 0 :(得分:1)
正如评论中所建议的,将fd
变量移到函数之外应解决问题:
fd = pickle.load(open("data/fd.txt", 'rb'))
def compute_ave_freq(word_forms):
total_freq = 0
for form in word_forms:
freq = fd.freq(form)
total_freq += freq
try:
ave_freq = total_freq/len(word_forms)
except ZeroDivisionError:
ave_freq = 0
return ave_freq
但是,既然你正在创建一个求和平均函数,这里有一个更简单的实现:
fd = pickle.load(open("data/fd.txt", 'rb'))
def compute_ave_freq(word_forms):
try:
return sum([fd.freq(form) for form in word_forms]) / len(word_forms)
except ZeroDivisionError:
return 0
或者:
fd = pickle.load(open("data/fd.txt", 'rb'))
def compute_ave_freq(word_forms):
l = len(word_forms)
if l > 0:
return sum([fd.freq(form) for form in word_forms]) / l
else:
return 0
或更简单:
fd = pickle.load(open("data/fd.txt", 'rb'))
def compute_ave_freq(word_forms):
l = len(word_forms)
return sum([fd.freq(form) for form in word_forms]) / l if l > 0 else 0
或lambda
:
fd = pickle.load(open("data/fd.txt", 'rb'))
compute_ave_freq = lambda x: sum(fd.freq(i) for i in x)/len(x)
ave_freq = compute_ave_freq(word_forms) if len(word_forms) > 0 else 0