Question

我试图构建一个基本上选择除了某些节点之外的所有内容的XPath查询。

这是我正在经历的XML：

<?xml version="1.0" encoding="UTF-8"?>

<task>
  <title id="30014">Instructions</title>
  <taskbody>
    <context>
      <p>Your box has a document.</p>
      <p audience="print">To get the document:</p>
      <p audience="web">
        <xref href="/node/6308" scope="external">Click here</xref> to get the document.
      </p>
    </context>
    <steps audience="print">
      <step>
        <cmd>Go to 
          <u>www.google.com</u>.
        </cmd>
      </step>
      <step>
        <cmd>Click on the “Resource” button.</cmd>
        <info>
          <fig frame="all">
            <image href="resource.ai" height="1.650in" width="4.500in"/>
          </fig>
        </info>
      </step>
      <step>
        <cmd>Click on “Manuals”.</cmd>
      </step>
      <step>
        <cmd>Click on “Shipping”.</cmd>
      </step>
      <step>
        <cmd>You can save or print it from your browser window.</cmd>
      </step>
    </steps>
  </taskbody>
</task>

我需要选择观众不等于“打印”的所有内容。

我一直在尝试各种各样的方式，但我似乎没有按照我的方式工作。

这是最新的一个，但几乎没有：

task/taskbody//*[not(@audience = "print")]

问题是，它可以很好地剥离具有“打印”值的节点1级。但是，具有“打印”值的第一个<p>位于<context>内。该节点似乎永远不会被选中。

以下是查询的结果：

<?xml version="1.0" encoding="UTF-8"?>
<result>
<context>
      <p>Your box has a document.</p>
      <p audience="print">To get the document:</p>
      <p audience="web">
        <xref href="/node/6308" scope="external">Click here</xref> to get the document.
      </p>
    </context>

<p>Your box has a document.</p>

<p audience="web">
        <xref href="/node/6308" scope="external">Click here</xref> to get the document.
      </p>

<xref href="/node/6308" scope="external">Click here</xref>

<step>
        <cmd>Go to 
          <u>www.google.com</u>.
        </cmd>
      </step>

<cmd>Go to 
          <u>www.google.com</u>.
        </cmd>

<u>www.google.com</u>

<step>
        <cmd>Click on the “Resource” button.</cmd>
        <info>
          <fig frame="all">
            <image height="1.650in" href="resource.ai" width="4.500in"/>
          </fig>
        </info>
      </step>

<cmd>Click on the “Resource” button.</cmd>

<info>
          <fig frame="all">
            <image height="1.650in" href="resource.ai" width="4.500in"/>
          </fig>
        </info>

<fig frame="all">
            <image height="1.650in" href="resource.ai" width="4.500in"/>
          </fig>

<image height="1.650in" href="resource.ai" width="4.500in"/>

<step>
        <cmd>Click on “Manuals”.</cmd>
      </step>

<cmd>Click on “Manuals”.</cmd>

<step>
        <cmd>Click on “Shipping”.</cmd>
      </step>

<cmd>Click on “Shipping”.</cmd>

<step>
        <cmd>You can save or print it from your browser window.</cmd>
      </step>

<cmd>You can save or print it from your browser window.</cmd>

</result>

它抓取没有属性的节点，它用“web”抓取节点，大多数节点用“print”除外。

有什么建议吗？

Answer 1

此表达式将选择所有不具有@audience属性的元素，以及包含但不包含字符串print的值的元素：

//*[not(descendant::*[@audience='print']) and not(ancestor-or-self::*[@audience='print'])]

上面写的方式将选择<title>，<p>的第一个和第三个<context>个孩子。它不会选择<steps>或第二个<p>，因为它们的audience属性包含print。

要排除标题（将上下文缩小为taskbody），请使用：

//task/taskbody//*[not(descendant::*[@audience='print']) and not(ancestor-or-self::*[@audience='print'])]

XPath查询以选择没有特定属性的特定值的任何后代

1 个答案: