Question

我在python中有以下作业，它将200,000个文件附加在一起（预期卷为13GB）。我在笔记本电脑上运行它。因为它的进展非常缓慢（我每秒说5个文件），我想确保它已经过优化。我有什么方法可以改善这项工作的表现吗？

from __future__ import print_function
from xml.dom import minidom
import os

f=open('C:\\temp\\temp.xml','w')
f.write("")
f.close();

counter = 0
for root, dirs, files in os.walk("C:\Users\username\Desktop\mydata"):
    for file in files:
        if file.endswith(".xml"):

            xmldoc = minidom.parse(os.path.join(root, file))
            article = xmldoc.getElementsByTagName('article').item(0)
            with open("C:\\temp\\temp.xml", "a") as myfile:
                myfile.write(article.toxml('utf-8'))
            counter+=1
            print(counter)

Answer 1

我的代码的巨大瓶颈是minidom库。我切换到lxml，代码得到（严重）加快了100倍

Python：优化追加200,000个文件的大工作

1 个答案: