我有磁盘I / O密集型代码,下面提供了代码片段:
if __name__ == '__main__':
files = get_files()
start = time.time()
for i, fpath in enumerate(files):
print("%d/%d processing. " % (i, len(files)))
with open(fpath) as f:
contents = f.read()
# Uses NLTK and other rules for spliting and cleaning.
sents = split_into_sentences_and_clean(contents)
doc = "\n".join(sents)
# write cleaned sentences again back to disk
end = time.time()
print("Loading and processing documents took %s seconds" % str(end - start))
jupyter笔记本中的相同代码工作速度提高了4倍。我能做错什么?
其他细节: 操作系统:Mac OSX Python 3.6
任何指出神秘面纱的指针都会很棒。
UPDATE1 : 问题在于列表理解。我认为这会产生微不足道的影响。列表压缩中有一个额外条件,如pycharm场景中所述。任何关于为什么这种单一条件恶化性能的见解都会有所帮助。
def strip_non_ascii_SLOW(text):
return ''.join(i for i in text if all([ord(i)<128, ord(i)>64]) )
def strip_non_ascii_FAST(text):
# 65-A, 122-z
l = list()
for i in text:
j = ord(i)
if j < 128 and j > 64:
l.append(i)
s = ''.join(l)
return s
由于 Sri Harsha