我正在使用python 3.5
为了从word文档中提取文本内容,我使用了xml.etree.ElementTree。 我无法将生成的XML内容写入其他文件。
以下是我的代码
import zipfile
import xml.etree.ElementTree
with zipfile.ZipFile('<path to docx file>') as docx:
tree = xml.etree.ElementTree.XML(docx.read('word/document.xml'))
我尝试了两种方法:
tree.write('<path to file>', encoding='utf8')
和
xml.etree.ElementTree.write('<path to file>')
但是这两种方法都将错误抛出:
AttributeError:&#39; xml.etree.ElementTree.Element&#39;对象没有属性&#39;写&#39;
请帮助。
答案 0 :(得分:0)
tree 通常是一个ElementTree对象,而 root 是它的子根元素。
使用xml.etree.ElementTree.parse()
加载会返回正确的ElementTree。
# XML() -> returns root Element
root = xml.etree.ElementTree.XML(docx.read('word/document.xml'))
print(root)
<Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}document' at 0x7f5f117ae5d0>
# parse() -> returns tree ElementTree
tree = xml.etree.ElementTree.parse(docx.open('word/document.xml'))
print(tree)
<xml.etree.ElementTree.ElementTree object at 0x7f5f11805890>
root = tree.getroot()
print(root)
<Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}document' at 0x7f5f11805950>
然后:
import zipfile
import xml.etree.ElementTree
with zipfile.ZipFile('<path to docx file>') as docx:
tree = xml.etree.ElementTree.parse(docx.open('word/document.xml'))
tree.write('<path to file>')