Question

- 修改后的问题 -

非常感谢所有提供潜在解决方案的人，但这些都符合我的尝试，所以我认为我应该更清楚。我稍微扩展了XML以使问题更加透明。

XML实际上是包含翻译内容的各种文件的汇编，目的是获得一个统一的文档，其中只包含唯一的英文字符串，并且（在手动审查和清理之后）每个字符串都有一个翻译的文件，所以它可以用于翻译记忆库。这就是为什么它现在是一个包含大量冗余信息的大文件。

每个参数行包含英语母版（可在文件中重复几十次）和翻译变体。在某些情况下，它很容易，因为所有翻译版本都是相同的，所以我最终会得到一行，但在其他情况下，它可能会更复杂。

所以，假设今天我有10个包含相同英语内容（＃1），2个不同的德语变体，3个不同的法语版本，其余的语言环境只有一个我需要获得的变体：

1 Para：1 EN / 2 DE（v1和v2）/ 3 FR（v1，v2和v3）/ ...

对于我的列表中的每个分组的唯一英语值重复这个

修改后的XML：

<Books>
<!--First English String (#1) with number of potential translations -->
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v1</DE>
    <FR>French Trans of #1 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v2</DE>
    <FR>French Trans of #1 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v1</DE>
    <FR>French Trans of #1 v2</FR>
    <!-- More locales here -->
</Para>
<!--Second English String (#2) with number of potential translations -->
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v1</DE>
    <FR>French Trans of #2 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v3</DE>
    <FR>French Trans of #2 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v2</DE>
    <FR>French Trans of #2 v1</FR>
    <!-- More locales here -->
</Para>
<!--Loads of additional English Strings (#3 ~ #n) with number of potential    translations -->

目前的解决方案为我提供了以下输出

<Books>
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v1</DE>
    <DE>German Trans of #1 v2</DE>
    <DE>German Trans of #2 v1</DE>
    <DE>German Trans of #2 v3</DE>
    <DE>German Trans of #2 v2</DE>
    <FR>French Trans of #1 v1</FR>
    <FR>French Trans of #1 v1</FR>
    <FR>French Trans of #1 v2</FR>
    <FR>French Trans of #2 v1</FR>
</Para>
</Books>

因此，只取第一个EN标签，然后将所有其他标签分组，与英语主字符串之间的差异无关。虽然我的目标是获得以下内容：

<Books>
<!-- First Grouped EN string and linked grouped translations -->
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v1</DE>
    <DE>German Trans of #1 v2</DE>
    <FR>French Trans of #1 v1</FR>
    <FR>French Trans of #1 v2</FR>
</Para>
<!-- Second Grouped EN string and linked grouped translations -->
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v1</DE>
    <DE>German Trans of #2 v3</DE>
    <DE>German Trans of #2 v2</DE>
    <FR>French Trans of #2 v1</FR>
</Para>
<!-- 3d to n Grouped EN string and linked grouped translations -->
</Books>

Answer 1

扩展XSLT 2.0答案以完成问题请求中的更新

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="Books">
        <xsl:copy>
            <xsl:for-each-group select="*" 
                group-by="EN">
                <xsl:copy>
                   <xsl:copy-of select="EN"/>
                   <xsl:for-each-group select="current-group()/*[not(local-name()='EN')]"
                        group-by=".">
                        <xsl:sort select="local-name()"/>
                        <xsl:copy-of select="."/>
                    </xsl:for-each-group>
                </xsl:copy>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

扩展XSLT 1.0答案以完成问题请求中的更新

即使您需要两种不同类型的按键，您仍然可以使用相同类型的解决方案。这是第一个容易想到的解决方案：

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:key name="main" match="Para" use="EN"/>
    <xsl:key name="locale" match="Para/*[not(self::EN)]" use="concat(../EN,.)"/>

    <xsl:template match="Books">
        <xsl:copy>
            <xsl:apply-templates select="Para[
                generate-id()
                = generate-id(key('main',EN)[1])]" mode="EN"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*" mode="EN">
        <xsl:copy>
            <xsl:copy-of select="EN"/>
            <xsl:apply-templates select="../Para/*[
                generate-id()
                = generate-id(key('locale',concat(current()/EN,.))[1])]" mode="locale">
                <xsl:sort select="local-name()"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*" mode="locale">
        <xsl:copy>
            <xsl:value-of select="."/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

申请时

n the new provided input, produces:

<Books>
    <Para>
        <EN>English Content #1</EN>
        <DE>German Trans of #1 v1</DE>
        <DE>German Trans of #1 v2</DE>
        <FR>French Trans of #1 v1</FR>
        <FR>French Trans of #1 v2</FR>
    </Para>
    <Para>
        <EN>English Content #2</EN>
        <DE>German Trans of #2 v1</DE>
        <DE>German Trans of #2 v3</DE>
        <DE>German Trans of #2 v2</DE>
        <FR>French Trans of #2 v1</FR>
    </Para>
</Books>

这个XSLT 1.0转换完全符合您的要求，如果您愿意，它可以用作创建结果树的起点：

 <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>


    <xsl:key name="locale" match="Para/*[not(local-name()='EN')]" use="text()"/>

    <xsl:template match="Books">
        <xsl:copy>
            <Para>
                <xsl:copy-of select="Para[1]/EN"/>
                <xsl:apply-templates select="Para/*[
                    generate-id()
                    = generate-id(key('locale',text())[1])]" mode="group">
                    <xsl:sort select="local-name()"/>
                </xsl:apply-templates>
            </Para>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*" mode="group">
        <xsl:copy>
            <xsl:value-of select="."/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

说明：

xsl:key用于按内容对所有元素进行分组（但EN）
第一个PARA/EN节点的简单直接副本
Meunchian分组方法与xsl:sort一起输出按要求分组的其他元素（报告一次内容相同的元素）

当应用于问题中提供的输入时，结果树为：

<Books>
   <Para>
      <EN>Some English Content</EN>
      <DE>German Trans v1</DE>
      <DE>German Trans v2</DE>
      <FR>French Trans v1</FR>
      <FR>French Trans v2</FR>
   </Para>
</Books>

与XSLT 2.0 xsl:for-each-group相同的结果（和更短的转换）：

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="Books">
        <xsl:copy>
            <Para>
                <xsl:copy-of select="Para[1]/EN"/>
                <xsl:for-each-group select="Para/*[not(local-name()='EN')]" 
                            group-by=".">
                    <xsl:sort select="local-name()"/>
                    <xsl:copy>
                        <xsl:value-of select="."/>
                    </xsl:copy>
                </xsl:for-each-group>
            </Para>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Answer 2

此转化：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:key name="kLangByValAndText"
  match="Para/*[not(self::EN)]"
  use="concat(name(), '+++', .)"/>

 <xsl:template match="/">
  <Books>
   <Para>
    <xsl:copy-of select="/*/Para[1]/EN"/>
    <xsl:for-each select=
    "/*/*/*[generate-id()
           =
            generate-id(key('kLangByValAndText',
                            concat(name(), '+++', .)
                            )
                            [1]
                       )
           ]
    ">
     <xsl:sort select="name()"/>
     <xsl:copy-of select="."/>
    </xsl:for-each>
   </Para>
  </Books>
 </xsl:template>
</xsl:stylesheet>

应用于此XML文档（提供的扩展版本以使其更有趣）：

<Books>
    <Para>
        <EN>Some English Content</EN>
        <DE>German Trans v1</DE>
        <FR>French Trans v1</FR>
        <!-- More locales here -->
    </Para>
    <Para>
        <EN>Some English Content</EN>
        <EN-US>Some English Content</EN-US>
        <DE>German Trans v1</DE>
        <FR>French Trans v1</FR>
        <!-- More locales here -->
    </Para>
    <Para>
        <EN>Some English Content</EN>
        <Australian>Some English Content</Australian>
        <DE>German Trans v1</DE>
        <FR>French Trans v2</FR>
        <!-- More locales here -->
    </Para>
    <!-- Much more para's hereafter containing variety of <EN> Content -->
</Books>

生成想要的正确结果：

<Books>
   <Para>
      <EN>Some English Content</EN>
      <Australian>Some English Content</Australian>
      <DE>German Trans v1</DE>
      <EN-US>Some English Content</EN-US>
      <FR>French Trans v1</FR>
      <FR>French Trans v2</FR>
   </Para>
</Books>

解释：Muenchian对复合（2部分）键进行分组。

请注意：仅对翻译进行分组（与此问题的另一个答案相同）丢失<Australian>翻译 - 将@empo的解决方案应用于此同一文档，结果是（<Australian>丢失！）：

<Books>
   <Para>
      <EN>Some English Content</EN>
      <DE>German Trans v1</DE>
      <EN-US>Some English Content</EN-US>
      <FR>French Trans v1</FR>
      <FR>French Trans v2</FR>
   </Para>
</Books>

Answer 3

另一个muenchian分组，具有子级别的复合键：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" />
  <xsl:key name="english" match="EN" use="." />
  <xsl:key name="others" match="Para/*[not(self::EN)]" use="concat(../EN, '&#160;', ., '&#160;', name())" />
  <xsl:template match="/Books">
    <Books>
      <xsl:for-each select="Para/EN[generate-id() = generate-id(key('english', .)[1])]">
        <Para>
          <xsl:copy-of select=".|key('english', .)/../*[not(self::EN)][generate-id() = generate-id(key('others', concat(current(), '&#160;', ., '&#160;', name()))[1])]" />
        </Para>
      </xsl:for-each>
    </Books>
  </xsl:template>
</xsl:stylesheet>

Answer 4

使用Saxon 9时，我应用样式表

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">

  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes"/>

  <xsl:template match="Books">
    <xsl:copy>
      <xsl:for-each-group select="Para" group-by="EN">
        <xsl:apply-templates select="."/>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Para">
    <xsl:copy>
      <xsl:copy-of select="EN"/>
      <xsl:for-each-group select="current-group()/(* except EN)" group-by="node-name(.)">
        <xsl:for-each-group select="current-group()" group-by=".">
          <xsl:copy-of select="."/>
        </xsl:for-each-group>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

到输入

<Books>
<!--First English String (#1) with number of potential translations -->
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v1</DE>
    <FR>French Trans of #1 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v2</DE>
    <FR>French Trans of #1 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #1</EN>
    <DE>German Trans of #1 v1</DE>
    <FR>French Trans of #1 v2</FR>
    <!-- More locales here -->
</Para>
<!--Second English String (#2) with number of potential translations -->
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v1</DE>
    <FR>French Trans of #2 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v3</DE>
    <FR>French Trans of #2 v1</FR>
    <!-- More locales here -->
</Para>
<Para>
    <EN>English Content #2</EN>
    <DE>German Trans of #2 v2</DE>
    <FR>French Trans of #2 v1</FR>
    <!-- More locales here -->
</Para>
</Books>

我得到了结果

<Books>
   <Para>
      <EN>English Content #1</EN>
      <DE>German Trans of #1 v1</DE>
      <DE>German Trans of #1 v2</DE>
      <FR>French Trans of #1 v1</FR>
      <FR>French Trans of #1 v2</FR>
   </Para>
   <Para>
      <EN>English Content #2</EN>
      <DE>German Trans of #2 v1</DE>
      <DE>German Trans of #2 v3</DE>
      <DE>German Trans of #2 v2</DE>
      <FR>French Trans of #2 v1</FR>
   </Para>
</Books>

如何按内容分组元素（XSLT 2.0）？

4 个答案: