
时间:2017-03-08 10:47:30

标签: python macos pycharm jupyter-notebook

我有磁盘I / O密集型代码,下面提供了代码片段:

if __name__ == '__main__':
files = get_files()

start = time.time()

for i, fpath in enumerate(files):
    print("%d/%d processing. " % (i, len(files)))
    with open(fpath) as f:
        contents = f.read()
        # Uses NLTK and other rules for spliting and cleaning. 
        sents = split_into_sentences_and_clean(contents)
        doc = "\n".join(sents)            
        # write cleaned sentences again back to disk
end = time.time()
print("Loading and processing documents took %s seconds" % str(end - start))


其他细节: 操作系统:Mac OSX Python 3.6


UPDATE1 : 问题在于列表理解。我认为这会产生微不足道的影响。列表压缩中有一个额外条件,如pycharm场景中所述。任何关于为什么这种单一条件恶化性能的见解都会有所帮助。

def strip_non_ascii_SLOW(text):
    return ''.join(i for i in text if all([ord(i)<128, ord(i)>64]) )

def strip_non_ascii_FAST(text):
    # 65-A, 122-z
    l = list()
    for i in text:
        j = ord(i)
        if j < 128 and j > 64:

    s = ''.join(l)
    return s

由于 Sri Harsha

0 个答案:
