我正在尝试计算给定目录中所有文件的所有行。该代码按预期工作,但性能似乎很慢。该文件很大(3GB)->我可以在我的终端机中运行“ wc -l file.txt”,数秒之内计算所有行(87.000.000)。我的python代码花了8分19秒完成。我可以通过任何方式改进代码以使其更快吗? print(counter)仅在那里可以看到进程正在运行。
import os
from datetime import datetime
import codecs
start_time = datetime.now()
search_path = "/home/williams/Desktop/DB2"
file_type = ".txt"
def line_counter():
counter = 0
for folder, dirs, files in os.walk(search_path):
for file in files:
if file.endswith(file_type):
fullpath = os.path.join(folder, file)
with codecs.open(fullpath, 'r', encoding='utf-8', errors='ignore') as my_file:
for line in my_file:
counter +=1
print(counter)
print('I found: ', counter, "lines in the DB!")
line_counter()
elapsed_time = datetime.now() - start_time
print('I counted all lines without problems')
print('The search took: ', elapsed_time, 'to complete')