如何从docx(word文件)中读取数字列表
bulletsquestions.docx:
1. this is a question text
A. Option first
B. Option second
C. Option third
D. Option fourth
E. Option fifth
stack.py:
import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
for read_item in paragraph.getiterator(TEXT):
tex=tex+read_item.text
print(tex)
结果: