任务很简单-从Word文档中提取图像。由于我有很多文件(例如10000多个),因此我决定为此任务实施多处理。
import docx2txt
import os
from multiprocessing import Pool
ABS_PATH = os.path.dirname(os.path.realpath(__file__))
def extract_image(docs):
for root, dirs, filenames in os.walk(docs):
for f in filenames:
directory = os.path.join(ABS_PATH, "images/")
docx2txt.process("%s%s" % (docs, f), directory)
def get_docs():
source = os.path.join(ABS_PATH, 'docs/')
return source
if __name__ == "__main__":
docs = get_docs()
pool = Pool()
pool.map(extract_image, docs)
pool.close()
pool.join()
我希望图像将被提取到docs文件夹中,但是我得到了:
PermissionError:[Errno 13]权限被拒绝:'\ pagefile.sys'