Question

我想提取.docx文件中检测到的粗体字的位置。

为此，我使用了docx库，并成功检测到粗体格式的单词。但是，仅提取单词并不是非常有用，因为您可能会找到相同的单词，但是使用其他格式。

例如：

我们假设我的file.docx包含： “我的猫不是正常的猫”

from docx import *

document = Document('/path/to/file.docx')
            def bold(document):
                for para in document.paragraphs:
                    Listbolds = []
                    for run in para.runs:
                        if run.bold:
                            print run.text
                            word = run.text
                            Listbolds.append(word)
                return Listbolds

这个函数会给我单词“cat”作为输出。但是，如果我试图通过那些不是粗体的单词来过滤我的文本，并且我使用它，我也会消除第二个“cat”，它不是粗体。

有关如何只获得这个词的位置的任何想法？对于exaple，获得2作为单词位置。

谢谢大家！

Answer 1

我没有获得docx库，但只是通过查看代码，可能更改它以返回布尔列表？

document = Document('/path/to/file.docx')

def get_bold_list(para):
    bold_list = []
    for run in para.runs:
        bold_list.append(run.bold)
    return bold_list

for para in document.paragraphs:
    bold_list = get_bold_list(para)
    #do something with bold_list

使用Python提取粗体字的位置

1 个答案: