我有超过44K的doc文件正在等待转换为docx。我用来转换单个文档文件的代码如下:
from win32com import client
def doc2docx(doc_name):
word = client.Dispatch("Word.Application")
doc = word.Documents.Open(doc_name)
docx_name = doc_name.replace(".doc", ".docx")
doc.SaveAs(docx_name, 16)
doc.Close()
word.Quit()
我尝试了以下代码来转换10个doc文档的子集:
from glob import glob
from time import time
paths = glob("U:\\WordDocuments\*.doc")
start = time()
counter = 0
for i in paths:
doc2docx(i)
counter += 1
print(counter)
end = time()
duration = end -start
print("It took", duration, "seconds to process 10 doc files.")
上面的代码运行没有错误。但是,花了3分钟多的时间才能隐藏10个doc文档。我如何加快这个过程?我可以想到多线程或多处理,但是我不知道如何实现它们。谢谢!
答案 0 :(得分:0)
from win32com import client
from glob import glob
from time import time
from multiprocessing import Pool
def doc2docx(doc_name):
word = client.Dispatch("Word.Application")
doc = word.Documents.Open(doc_name)
docx_name = doc_name.replace(".doc", ".docx")
doc.SaveAs(docx_name, 16)
doc.Close()
word.Quit()
paths = glob("U:\\WordDocuments\*.doc")
global start
start = time()
A = []
pool = Pool()
r=pool.map_async(doc2docx,paths,callback=pool_processing_complete)
r.wait()
pool.close()
pool.join()
def pool_processing_complete(x):
A.extend(x)
global start
end = time()
duration = end -start
print("It took", duration, "seconds to process 10 doc files.")
使用多处理池,这是示例。