Question

我有磁盘I / O密集型代码，下面提供了代码片段：

if __name__ == '__main__':
files = get_files()

start = time.time()

for i, fpath in enumerate(files):
    print("%d/%d processing. " % (i, len(files)))
    with open(fpath) as f:
        contents = f.read()
        # Uses NLTK and other rules for spliting and cleaning. 
        sents = split_into_sentences_and_clean(contents)
        doc = "\n".join(sents)            
        # write cleaned sentences again back to disk
end = time.time()
print("Loading and processing documents took %s seconds" % str(end - start))

jupyter笔记本中的相同代码工作速度提高了4倍。我能做错什么？

其他细节：操作系统：Mac OSX Python 3.6

任何指出神秘面纱的指针都会很棒。

UPDATE1 ：问题在于列表理解。我认为这会产生微不足道的影响。列表压缩中有一个额外条件，如pycharm场景中所述。任何关于为什么这种单一条件恶化性能的见解都会有所帮助。

def strip_non_ascii_SLOW(text):
    return ''.join(i for i in text if all([ord(i)<128, ord(i)>64]) )


def strip_non_ascii_FAST(text):
    # 65-A, 122-z
    l = list()
    for i in text:
        j = ord(i)
        if j < 128 and j > 64:
            l.append(i)

    s = ''.join(l)
    return s

由于 Sri Harsha

Jupyter笔记本很快，而Pycharm非常慢。这是预期的吗？

0 个答案: