如何从docx(word文件)中读取数字列表

时间:2017-11-18 14:52:31

标签: python xml ms-word

如何从docx(word文件)中读取数字列表

bulletsquestions.docx:

  1. this is a question text 
    A.  Option first
    B.  Option second
    C.  Option third
    D.  Option fourth
    E.  Option fifth

stack.py:

 import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
    for read_item in paragraph.getiterator(TEXT):
        tex=tex+read_item.text
print(tex)

结果:

  1. 这是一个问题文本选项firstOption secondOption thirdOption fourthOption 5th

0 个答案:

没有答案