我尝试使用python-docx模块。到目前为止,我已经能够从Word文件中提取特定的段落以及整个文本。
pip install --pre python-docx #to install python-docx
from docx import Document
document = Document('file.docx')
document.paragraphs # to extract paragraphs
document.paragraphs[2].text # gives the text
for par in document.paragraphs: # to extract the whole text
print(par.text)
# I tried the below code to find some specific term
for i in range(0, 50, 1):
if (document.paragraphs[i].text == ('Some-word')):
print document.paragraph
我希望在单词文件中找到突出显示形式的特定单词
答案 0 :(得分:1)
它将搜索所有段落
for par in document.paragraphs: # to extract the whole text
if 'Some-word' in par.text:
print(par.text)