Question

我有以下类别：

<categories>
    <category>anotherparent</category>
    <category>parent</category>
    <category>parent/child1</category>
    <category>parent/child1/subchild1</category>
    <category>parent/child2</category>
    <category>parent/child3/</category>
    <category>parent/child3/subchild3</category>
</categories>

这里的问题是类别路径是“重复的”。基本上我想删除所有父类别路径，只包含最具体的级别。所以结果应该是这样的：

<categories>
    <category>anotherparent</category>
    <category>parent/child1/subchild1</category>
    <category>parent/child2</category>
    <category>parent/child3/subchild3</category>
</categories>

我可以考虑一些 java 扩展，但我找不到如何在 xslt 中执行此操作的正确方法/函数，而且我很确定这应该很容易。

可以是 xslt 2 或 3。

Answer 1

也许

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    expand-text="yes"
    exclude-result-prefixes="#all"
    xmlns:mf="http://example.com/mf"
    version="3.0">
  
  <xsl:function name="mf:group" as="element(category)*">
    <xsl:param name="cats"/>
    <xsl:param name="level"/>
    <xsl:choose>
      <xsl:when test="$cats?2[$level]">
        <xsl:for-each-group select="$cats[?2[$level]]" group-by="?2[$level]">
          <xsl:sequence select="mf:group(current-group(), $level + 1)"/>
        </xsl:for-each-group>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="$cats?1"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:output indent="yes"/>

  <xsl:template match="categories">
    <xsl:copy>
      <xsl:sequence select="mf:group(category ! [., tokenize(., '/')], 1)"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

帮助；假设，就像评论所问的那样，/ 中的尾随 <category>parent/child3/</category> 是一个错字，应该是 <category>parent/child3</category>。如果 parent/child3/ 可以出现但应被视为 parent/child3，则使用 tokenize(., '/')[normalize-space()] 而不是 tokenize(., '/')。

在函数中使用包含两个项的映射序列而不是大小为 2 的数组序列可能更简洁：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    expand-text="yes"
    exclude-result-prefixes="#all"
    xmlns:mf="http://example.com/mf"
    version="3.0">
  
  <xsl:function name="mf:group" as="element(category)*">
    <xsl:param name="cats" as="map(xs:string, item()*)*"/>
    <xsl:param name="level" as="xs:integer"/>
    <xsl:choose>
      <xsl:when test="$cats?tokens[$level]">
        <xsl:for-each-group select="$cats[?tokens[$level]]" group-by="?tokens[$level]">
          <xsl:sequence select="mf:group(current-group(), $level + 1)"/>
        </xsl:for-each-group>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="$cats?cat"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:output indent="yes"/>

  <xsl:template match="categories">
    <xsl:copy>
      <xsl:sequence select="mf:group(category ! map { 'cat' : ., 'tokens' : tokenize(., '/') }, 1)"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

同样，如果可能出现尾随、前导或斜线之间但应忽略的情况，则可能需要使用 tokenize(., '/')[normalize-space()] 而不是 tokenize(., '/')。

Answer 2

如果您输入的 XML 始终采用您发布的格式，则此方法有效：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="category[starts-with(following-sibling::category[1],.)]"/>

</xsl:stylesheet>

在这里查看它的工作原理：https://xsltfiddle.liberty-development.net/gVrvcxY

xslt 按子字符串删除重复值

2 个答案: