Question

#!/usr/bin/tcsh -f

echo please enter files list 

set x = $<

foreach i(`cat $x`)
  echo $i
end

理论上，此代码应获取文档文本，然后以黄色找到突出显示的文本，但是我的问题是一开始我按原样运行该代码， from docx import * document = Document(r'filepath.docx') words = document.xpath('//w:r', namespaces=document.nsmap) WPML_URI = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main' tag_rPr = WPML_URI + 'rPr' tag_highlight = WPML_URI + 'highlight' tag_val = WPML_URI + 'val' tag_t = WPML_URI + 't' for word in words: for rPr in word.findall(tag_rPr): high = rPr.findall(tag_highlight) for hi in high: if hi.attribute[tag_val] == 'yellow': print(word.find(tag_t).text.encode('utf-8').lower()) 作为错误消息。它的问题显然与 AttributeError: 'Document' object has no attribute 'xpath' 而且我不知道如何解决

Answer 1

问题是您正在尝试对docx.Document进行不允许的操作。如果您查看here，则可以查看此文档，而.xpath不存在Document。

如果您需要这些单词，则可以通过Document.paragraph方法获得这些单词-也在链接的文档中。

Answer 2

@PirateNinjas马上打开。 Document对象不是lxml.etree._Element的子类，因此没有.xpath()方法。这就是AttributeError所指示的；对象上的每个方法都是一个属性（就像实例变量一样），如果不存在您要求的名称，则会出现此错误。

但是，Document._element 确实是子类_Element，可能对您有用。至少它不会给您这个错误，应该将您朝正确的方向进一步发展。此代码应为您提供文档主要故事中的所有<w:r>元素（即文档正文，但不包括标题，脚注等）：

rs = document._element.xpath("//w:r")

Answer 3

from docx import *
document = Document('xyz.docx')
words = document._element.xpath('//w:r')
WPML_URI = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
tag_rPr = WPML_URI + 'rPr'
tag_highlight = WPML_URI + 'highlight'
tag_val = WPML_URI + 'val'
tag_t = WPML_URI + 't'
for word in words:
    for rPr in word.findall(tag_rPr):
        high=rPr.findall(tag_highlight)
        for hi in high:
            if hi.attrib[tag_val] in ['yellow','blue','green']:
                 print(word.find(tag_t).text.encode('utf-8'))

如果您要提取文本，例如说黄色文本，请使用以下代码行：

if hi.attrib[tag_val] == 'yellow':

.docx文档的属性错误，没有属性“ xpath”

3 个答案: