Question

以下是xml的示例：

<w:p>
   <w:r>
      <w:rPr>
      <w:b/>
   <w:t> There was a rich girl </w:t>
   </w:r>
   <w:r>
      <w:rPr>
      <w:bCs/>
   <w:t> Nananananan </w:t>
   </w:r>
   <w:r>
      <w:rPr>
      <w:b/>
      <w:bCs/>
   <w:t>If I had all the money in the world </w:t>
   </w:r>
</w:p>

我希望提取文字"There was a rich girl Nanananan"，但不 "If i had all the money.." 我需要提取与<w:b>或<w:bCs>标记相对应的文本，但如果两者都出现在一起，我需要跳过提取。

换句话说，仅在w:bCs存在或w:b时提取文字。

我所做的是：

text2=" "
w = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'    
for r in p.xpath('.//w:t',namespaces={'w': w}):  
    if r.xpath('..//w:b|..//w:bCs[@w:val="0"]',namespaces={'w': w}):  
       text2 += r.text

这只是检查是否存在w：b或w：bCs（即使两者都存在也匹配）。我怎样才能为排他性添加条件？

Answer 1

'(..//w:b|..//w:bCs[@w:val="0"])[count(./..//w:b|./../w:bCs[@w:val="0"])=1]'

如果结果中有超过1个节点，则count(./..//w:b|./../w:bCs[@w:val="0"])=1将为false，并且[false]将使主序列不返回任何内容。

编辑：首先，xml真的坏了。 w:rPr的结束标记在哪里？其次，w:val代码w:bCs中没有属性[@w:val="0"]。仍然可以实现你想要的目标：

for r in p.xpath('.//w:t[./ancestor::w:r[count(.//w:b | .//w:bCs)=1]]',namespaces={'w': w}):
    text2 += r.text

编辑2：工作循环的附加条件为val="0"且破碎的xml：

for r in p.xpath('.//w:t[./ancestor::w:r[(.//w:b or .//w:bCs[@w:val="0"]) and count(.//w:b|.//w:bCs)=1]]',namespaces={'w': w}):
     text2 += r.text

在兄弟姐妹的存在下调节xpath，lxml

1 个答案: