我正在尝试从word文档中提取数据。document是问卷。文档包含表格,表格包含复选框。我要提取已选中复选框的标签。
from docx import Document
import pandas as pd
wordDoc = Document(path+file)
# I am interested only in first Table
table = wordDoc.tables[0]
# table's first column contains questions. Second column contains answers.
# I need only answers
cells = table.columns[1].cells
a = []
for cell in cells:
if cell._element.xpath('.//w:checkBox') != []:
checkboxes = cell._element.xpath('.//w:checkBox')
for checkbox in checkboxes:
# here should go some code
else:
a.append(cell.text)