Question

例如，假设输入xml具有以下结构：

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  <b>
  <c>
    <ca>
      <caa>6</caa>
      <cab>7</cab>
    </ca>
  </c>
</root>

给出过滤元素的xpath集合：

/root/a/ab,
/root/a/ac,
/root/c/ca/cab

生成的xml应为：

<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

XSLT如何表达这一点？

提前谢谢

Answer 1

以下是使用Saxon 9.5 PE或EE和XSLT 3.0的示例（当前在Saxon版本中实现的工作草案版本）：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:param name="paths" as="xs:string">
/root/a/ab,
/root/a/ac,
/root/c/ca/cab
</xsl:param>

<xsl:variable name="nodes" as="node()*">
  <xsl:evaluate xpath="$paths" context-item="/"/>
</xsl:variable>

<xsl:output indent="yes"/>

<xsl:template match="*[(.//node(), .//@*) intersect $nodes]">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()[(., .//node(), .//@*) intersect $nodes]"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="node()[. intersect $nodes]">
  <xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>

这是一个不同的版本，利用新的XSLT 3.0功能将变量引用作为匹配模式，我假设代码更有效（和可读）：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:param name="paths" as="xs:string">
/root/a/ab,
/root/a/ac,
/root/c/ca/cab
</xsl:param>

<xsl:variable name="nodes" as="node()*">
  <xsl:evaluate xpath="$paths" context-item="/"/>
</xsl:variable>

<xsl:variable name="ancestors" as="node()*" select="$nodes/ancestor::node()"/>

<xsl:output indent="yes"/>

<xsl:template match="$ancestors">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()[. intersect $ancestors or . intersect $nodes]"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="$nodes">
  <xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>

Answer 2

要在XSLT 1.0（EXSLT可能提供一些小帮助）或2.0中完成此任务，您可以首先将每个给定路径分解为自身和祖先路径，以便：

/root/c/ca/cab

例如，

成为：

<path>/root/c/ca/cab</path>
<path>/root/c/ca</path>
<path>/root/c</path>
<path>/root</path>

这不应该通过命名的递归模板来实现。

一旦你有了这个，你可以使用通过添加“pass-thru”参数修改的身份变换，以便每个已处理的元素可以计算自己的路径，将其与给定的路径列表进行比较并确定它是否应该加入结果树。

在以下样式表中，已跳过步骤1，结果正如同给定一样使用。

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="paths">
    <path>/root/a/ab</path>
    <path>/root/a</path>
    <path>/root</path>

    <path>/root/a/ac</path>
    <path>/root/a</path>
    <path>/root</path>

    <path>/root/c/ca/cab</path>
    <path>/root/c/ca</path>
    <path>/root/c</path>
    <path>/root</path>
</xsl:param>

<xsl:template match="@* | node()">
<xsl:param name="pathtrain" />
<xsl:variable name="path" select="concat($pathtrain, '/', name())" />
<xsl:if test="$path=exsl:node-set($paths)/path or not(self::*)">
    <xsl:copy>
         <xsl:apply-templates select="@* | node()">
            <xsl:with-param name="pathtrain" select="$path"/>
        </xsl:apply-templates>
    </xsl:copy>
</xsl:if>
</xsl:template>

</xsl:stylesheet>

应用于您的（更正的）输入：

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  </b>
  <c>
    <ca>
      <caa>6</caa>
      <cab>7</cab>
    </ca>
  </c>
</root>

获得以下结果：

<?xml version="1.0" encoding="utf-8"?>
<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

修改

请注意，在使用如上所述的基于字符串的测试时，重复的分支可能会产生误报。例如，应用于以下输入时：

<root> <a> <aa>1</aa> <ab>2</ab> <ac>3</ac> </a> <b> <ba>4</ba> <bb>5</bb> </b> <c> <ca> <caa>6</caa> </ca> </c> <c> <ca> <cab>7</cab> </ca> </c> </root>

上面的样式表将产生：

<?xml version="1.0" encoding="utf-8"?> <root> <a> <ab>2</ab> <ac>3</ac> </a> <c> <ca/> </c> <c> <ca> <cab>7</cab> </ca> </c> </root>

如果这是一个问题，我将发布另一个（更复杂的）XSLT 1.0答案，通过测试唯一ID来消除此问题。

Answer 3

这是一个更复杂的XSLT 1.0答案（也需要EXSLT node-set（）函数），它通过执行三次转换来解决重复分支的问题：

在第一遍中，使用带有“pass-thru”参数的身份变换模板来收集给定元素的ID，以识别它们 - 类似于我之前的答案;

在第二遍中，每个给定元素“收集”自身及其祖先的ids;

在第三次也是最后一次传递中，再次使用身份转换模板遍历整个源树，并仅输出在步骤2中收集了ID的元素。

请注意，此版本中不需要预处理给定路径。

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="paths">
    <path>/root/a/ab</path>
    <path>/root/a/ac</path>
    <path>/root/c/ca/cab</path>
</xsl:param>

<!-- first pass: get ids of given nodes -->
<xsl:variable name="ids">
    <xsl:apply-templates select="/" mode="getids"/>
</xsl:variable>

<xsl:template match="*" mode="getids">
<xsl:param name="pathtrain" />
<xsl:variable name="path" select="concat($pathtrain, '/', name())" />
<xsl:if test="$path=exsl:node-set($paths)/path">
    <id><xsl:value-of select="generate-id()" /></id>
    </xsl:if>
    <xsl:apply-templates select="*" mode="getids">
        <xsl:with-param name="pathtrain" select="$path"/>
    </xsl:apply-templates>
</xsl:template>

<!-- second pass: extend the list of ids to given nodes and their ancestors-->
<xsl:variable name="extids">
    <xsl:for-each select="//*[generate-id(.)=exsl:node-set($ids)/id]">
        <xsl:for-each select="ancestor-or-self::*">
            <id><xsl:value-of select="generate-id()" /></id>
        </xsl:for-each>
    </xsl:for-each>
</xsl:variable>

<!-- third pass: output the nodes whose ids are in the extended list -->
<xsl:template match="@* | node()">
    <xsl:if test="generate-id(.)=exsl:node-set($extids)/id or not(self::*)">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

以上样式表，当应用于以下“重复分支”输入时：

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  </b>
  <c>
    <ca>
      <caa>6</caa>
    </ca>
  </c>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

产生以下结果：

<?xml version="1.0" encoding="utf-8"?>
<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

XSLT过滤元素

3 个答案: