Question

我正在尝试提取有趣节点的文本（此处为big-structured-text）但在此节点中有一些我想跳过的子项（此处为title，subtitle，和code）。那些“删除”节点可以有孩子。

示例数据：

<root>
    <big-structured-text>
        <section>
            <title>Introduction</title>
            In this part we describe Australian foreign policy....
            <subsection>
                <subtitle>Historical context</subtitle>
                After its independence...
                <meta>
                    <keyword>foreign policy</keyword>
                    <keyword>australia</keyword>
                    <code>
                        <value>XXHY-123</value>
                        <label>IRRN</label>
                    </code>
                </meta>
            </subsection>
        </section>
    </big-structured-text>
    <!-- ... -->
    <big-structured-text>
        <!-- ... -->
    </big-structured-text>
</root>

到目前为止，我已经尝试过：

<xsl:for-each
     select="//big-structured-text">
         <text>
             <xsl:value-of select=".//*[not(*)
                 and not(ancestor-or-self::code)
                 and not(ancestor-or-self::subtitle)
                 and not(ancestor-or-self::title)
                 ]" />
         </text>
</xsl:for-each>

但这只是采取了没有任何子节点的节点，它将采用keyword但不是引言标题后的文本

我也试过了：

<xsl:for-each
     select="//big-structured-text">
         <text>
             <xsl:value-of select=".//*[
                 not(ancestor-or-self::code)
                 and not(ancestor-or-self::subtitle)
                 and not(ancestor-or-self::title)
                 ]" />
         </text>
</xsl:for-each>

但这是多次回复有趣的文本，有时是不感兴趣的文本（每个节点为自己迭代一次，然后每个祖先一次）。

Answer 1

而不是每个人都可以使用模板来解决这个问题。将模板应用于元素节点时，default behaviour只是递归地将它们应用于其所有子节点（包括文本节点以及其他元素），以及文本节点输出文本。因此，您需要做的就是创建空模板来压缩不所需的元素，然后让默认模板完成其余的工作。

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:template match="/">
    <root>
      <xsl:apply-templates select="/root/big-structured-text" />
    </root>
  </xsl:template>

  <xsl:template match="big-structured-text">
    <text><xsl:apply-templates /></text>
  </xsl:template>

  <!-- empty template means anything inside any of these elements will be
       ignored -->
  <xsl:template match="title | subtitle | code" />
</xsl:stylesheet>

在您的样本输入上运行时，会生成

<?xml version="1.0"?>
<root><text>


            In this part we describe Australian foreign policy....


                After its independence...

                    foreign policy
                    australia




    </text><text>

    </text></root>

您可能希望调查使用<xsl:strip-space>来摆脱一些无关的空白，但是如果内容混杂，您必须小心不要剥离太多。

通过跳过给定子级内的内容，通过XSL提取文本

1 个答案: