我正在使用win32com将.docx文件转换为.txt文件。它运行良好,直到它被西班牙语中的无法识别的字符。
DOC_FILEPATH = r"C:\Temp\Hugo- Ortíz -.docx"
s = find_between_r(DOC_FILEPATH, '.', '')
FILETXT = DOC_FILEPATH.strip(s)
FILETXT = FILETXT + "txt"
doc = win32com.client.GetObject(DOC_FILEPATH)
text = doc.Range().Text
with open(FILETXT, "wb") as f:
f.write(text.encode("utf-8"))
当win32com.client读取DOC_FILEPATH时,我收到此错误
moniker, i, bindCtx = pythoncom.MkParseDisplayName(Pathname)
pywintypes.com_error: (-2147221014, 'El moniker no puede abrir un archivo', None, None)
有没有办法在不更改名称的情况下读取该文件?
答案 0 :(得分:2)
这不是 Word Automation 的工作原理。检查 Word对象模型([MS.Docs]: Word)以获取更多详细信息。
您应该创建一个Word.Application
实例,这将处理文档。
我改编了[SO]: Python - Using win32com.client to accept all changes in Word Documents并在虚拟文档上为您测试了它。
code.py :
#!/usr/bin/env python3
# -*- coding: cp1252 -*-
import sys
import os
import win32com.client as w32comcl
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
doc_path = r"Documento ficticío.docx"
txt_path = os.path.splitext(doc_path)[0] + ".txt"
word = w32comcl.Dispatch("Word.Application")
try:
word.Visible = False
doc = word.Documents.Open(os.path.abspath(doc_path))
try:
text = doc.Range().Text
with open(txt_path, "wb") as f:
f.write(text.encode("utf8"))
finally:
doc.Close(False)
finally:
word.Application.Quit()
备注强>:
<强>输出强>:
(py35x64_test) e:\Work\Dev\StackOverflow\q049179872>dir /b code.py Documento ficticío.docx (py35x64_test) e:\Work\Dev\StackOverflow\q049179872>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32 (py35x64_test) e:\Work\Dev\StackOverflow\q049179872>dir /b code.py Documento ficticío.docx Documento ficticío.txt (py35x64_test) e:\Work\Dev\StackOverflow\q049179872>type "Documento ficticío.txt" P├írrafo fictic├¡o0: 1234567890qwertyuioopasdfghjklzxcvbnm.