Question

您好，我需要帮助解析以下XML。

<xmeml>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>bcd</Unit>
        <Unit2>2345</Unit2>
    </Test>
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>3456</Unit2>
    </Test>
    <Test>
        <Unit>cde</Unit>
        <Unit2>3456</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>def</Unit>
        <Unit2>4567</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>efg</Unit>
        <Unit2>2345</Unit2>
    </Test> 
</Doc>
</xmeml>

以下面的

结束

<xmeml>
<Doc>
    <Test>
        <Unit>bcd</Unit>
        <Unit2>2345</Unit2>
    </Test>
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>3456</Unit2>
    </Test>
    <Test>
        <Unit>cde</Unit>
        <Unit2>3456</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>def</Unit>
        <Unit2>4567</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>efg</Unit>
        <Unit2>2345</Unit2>
    </Test> 
</Doc>
</xmeml>

我正在尝试创建一个XSLT文档来执行此操作但尚未找到一个有效的文档。我应该注意'Doc'中所需的匹配参数，在这种情况下是“abc”和“1234”，在现实世界中，这些是变量，永远不会是一个静态的可搜索实体。

所以在英语中我的XSL会是这样的：对于包含匹配的“Unit”和“unit2”值的任何父级删除所有前面的父母'Test'，其中包含重复值'Unit'和'Unit2'，除了最后一个。

所有帮助最受赞赏感谢

Answer 1

这是一种相对简单的方法，虽然我很确定使用Meunchian方法有一种更有效的方法。如果性能不是问题，那么这可能更容易理解：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml"/>

  <xsl:template match="Test">
    <xsl:variable name="vUnit" select="Unit" />
    <xsl:variable name="vUnit2" select="Unit2" />
    <xsl:if test="not(following::Test[Unit = $vUnit and Unit2 = $vUnit2])">
      <xsl:call-template name="identity" />
    </xsl:if>
  </xsl:template>

  <xsl:template match="@* | node()" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Test模板只是检查Test和Unit中的Unit2元素是否具有相同的值，如果没有，则将其输出为正常。

Answer 2

使用for-each-group结构可以在XSLT 2.0中解决许多涉及消除重复的问题。在这种情况下，使用for-each-group的解决方案并不明显，因为它实际上并不是分组问题（对于分组问题，我们通常在输出中生成一个与输入中的一组元素对应的元素，并且这不是这里的情况。）我会像Dimitre一样处理它：使用for-each-group来识别组，因此需要保留的Test元素与需要删除的元素相对应。事实上，我开始解决这个问题并提出了一个与Dimitre非常相似的解决方案，除了我认为最后一个模板规则可以简化为

<xsl:template match="Test[not(. intersect $vLastInGroup)]"/>

这是我有时使用的编码模式示例，您可以在其中设置包含具有特定特征的所有元素的节点集值全局变量，然后使用模板规则来测试全局节点集的成员资格（使用谓词[. intersect $node-set]）。遵循这种模式，并使用XSLT 3.0中提供的一些新语法，我倾向于编写如下代码：

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:mode on-no-match="shallow-copy"/>

 <xsl:variable name="deletedElements" as="element()*">
  <xsl:for-each-group select="/*/Doc/Test"
                      group-by="Unit, Unit2" composite="yes">
   <xsl:sequence select="current-group()[position() ne last()]"/>
  </xsl:for-each-group>
 </xsl:variable>

 <xsl:template match="$deletedElements"/>
</xsl:stylesheet>

Answer 3

<强>予。 XSLT 1.0解决方案：

这是一个简单的（没有变量，没有xsl:if，没有轴，没有xsl:call-template）应用最有效的已知XSLT 1.0分组方法 - Muenchian grouping ：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:key name="kTestByData" match="Test" use="concat(Unit, '|', Unit2)"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match= "Test[not(generate-id() = generate-id(key('kTestByData',concat(Unit, '|', Unit2))[last()]) )]"/> </xsl:stylesheet>

在提供的XML文档上应用此转换时：

<xmeml> <Doc> <Test> <Unit>abc</Unit> <Unit2>1234</Unit2> </Test> <Test> <Unit>bcd</Unit> <Unit2>2345</Unit2> </Test> </Doc> <Doc> <Test> <Unit>abc</Unit> <Unit2>3456</Unit2> </Test> <Test> <Unit>cde</Unit> <Unit2>3456</Unit2> </Test> </Doc> <Doc> <Test> <Unit>abc</Unit> <Unit2>1234</Unit2> </Test> <Test> <Unit>def</Unit> <Unit2>4567</Unit2> </Test> </Doc> <Doc> <Test> <Unit>abc</Unit> <Unit2>1234</Unit2> </Test> <Test> <Unit>efg</Unit> <Unit2>2345</Unit2> </Test> </Doc> </xmeml>

产生了想要的正确结果：

<xmeml> <Doc> <Test> <Unit>bcd</Unit> <Unit2>2345</Unit2> </Test> </Doc> <Doc> <Test> <Unit>abc</Unit> <Unit2>3456</Unit2> </Test> <Test> <Unit>cde</Unit> <Unit2>3456</Unit2> </Test> </Doc> <Doc> <Test> <Unit>def</Unit> <Unit2>4567</Unit2> </Test> </Doc> <Doc> <Test> <Unit>abc</Unit> <Unit2>1234</Unit2> </Test> <Test> <Unit>efg</Unit> <Unit2>2345</Unit2> </Test> </Doc> </xmeml>

注意：对于要减少大量节点的节点集，Muenchian分组方法比二次方（O（N ^ 2））兄弟比较快很多倍分组。

<强> II。 XSLT 2.0解决方案：

II.1这是一个简单的（非高效且适用于长度较小的节点集）XSLT 2.0解决方案：

<xsl:stylesheet version="2.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "Test[concat(Unit, '+', Unit2) = following::Test/concat(Unit, '+', Unit2)]"/>
</xsl:stylesheet>

II.2使用xsl:for-each-group 的有效解决方案：

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vLastInGroup" as="element()*">
  <xsl:for-each-group select="/*/Doc/Test"
       group-by="concat(Unit, '+', Unit2)">
   <xsl:sequence select="current-group()[last()]"/>
  </xsl:for-each-group>
 </xsl:variable>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "Test[for $t in .
        return
         not($vLastInGroup[. is $t])
      ]"/>
</xsl:stylesheet>

XSL根据Element值删除所有前面的兄弟节点

3 个答案: