Python性能:行数非常慢

时间:2019-01-30 13:09:30

标签: python performance

我正在尝试计算给定目录中所有文件的所有行。该代码按预期工作,但性能似乎很慢。该文件很大(3GB)->我可以在我的终端机中运行“ wc -l file.txt”,数秒之内计算所有行(87.000.000)。我的python代码花了8分19秒完成。我可以通过任何方式改进代码以使其更快吗? print(counter)仅在那里可以看到进程正在运行。

import os
from datetime import datetime
import codecs

start_time = datetime.now()

search_path = "/home/williams/Desktop/DB2"
file_type = ".txt"

def line_counter():
    counter = 0
    for folder, dirs, files in os.walk(search_path):
        for file in files:
            if file.endswith(file_type):
                fullpath = os.path.join(folder, file)
                with codecs.open(fullpath, 'r', encoding='utf-8', errors='ignore') as my_file:
                    for line in my_file:
                        counter +=1
                    print(counter)

    print('I found: ', counter, "lines in the DB!")

line_counter()
elapsed_time = datetime.now() - start_time
print('I counted all lines without problems')
print('The search took: ', elapsed_time, 'to complete')

0 个答案:

没有答案