Question

我有一个大型XML语料库文档，其结构大致如下所示：

<corpus>
   <document n="001">
       <front>
          <title>foo title</title>
          <group n="foo_group_A"/>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
          <seg n="3">some text with markups</seg>
       </body>
   </document>
   <document n=002">
       <front>
          <title>foo title</title>
          <group n="foo_group_A"/>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
       </body>
   </document>
   <document n="003">
       <front>
          <title>foo title</title>
          <group n="foo_group_A"/>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
          <seg n="3">some text with markups</seg>
       </body>
   </document>
   <document n="004">
       <front>
          <title>foo title</title>
          <group n="foo_group_B"/>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
       </body>
   </document>
   <document n="005">
       <front>
          <title>foo title</title>
          <group n="foo_group_B"/>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
       </body>
   </document>
    [...]
</corpus>

我正在使用XSL 3.0将此XML文件预处理为不同格式的XML 在最终输出到PDF之前。作为转换的一部分，我想在<document>元素中收集并“包装”一个反映<chapter>值的新front/group/@n元素。新的语料库如下所示，其中group/@n值为新chapter下的分组提供了逻辑：

<corpus>
  <chapter n="foo_group_A">
   <document n="001">
       <front>
          <title>foo title</title>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
          <seg n="3">some text with markups</seg>
       </body>
   </document>
   <document n=002">
       <front>
          <title>foo title</title>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
       </body>
   </document>
   <document n="003">
       <front>
          <title>foo title</title>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
          <seg n="3">some text with markups</seg>
       </body>
   </document>
  </chapter>
  <chapter n="foo_group_B">
   <document n="004">
       <front>
          <title>foo title</title>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
       </body>
   </document>
   <document n="005">
       <front>
          <title>foo title</title>
       <front>
       <body>
          <seg n="1">some text with markups</seg>
          <seg n="2">some text with markups</seg>
       </body>
   </document>
  </chapter>
    [...]
</corpus>

该文件已经预先排序foo_group_A，foo_group_B等，因此无需额外排序。它只需要创建一个新元素<chapter>来包含相关文档。我已经用xsl:for-each尝试了这个，但我想我错过了某种“汇总”或“集合”的群体，可以通过这些群体进行迭代。

非常感谢提前。

Answer 1

如果您使用XSLT 3并希望对项目进行分组，那么您当然不会使用xsl:for-each，而是使用xsl:for-each-group，例如。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output method="xml" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="corpus">
      <xsl:copy>
          <xsl:for-each-group select="document" group-by="front/group/@n">
              <chapter n="{current-grouping-key()}">
                  <xsl:apply-templates select="current-group()"/>
              </chapter>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>  

  <xsl:template match="front/group"/>

</xsl:stylesheet>

http://xsltfiddle.liberty-development.net/nbUY4ki

如果document已经按分组键front/group/@n排序，那么使用xsl:for-each-group select="document" group-adjacent="front/group/@n"代替上面的group-by就足够了，这样就更容易了通过将streamable="yes"添加到xsl:mode声明并使用xsl:for-each-group select="copy-of(document)" group-adjacent="front/group/@n"进行分组，将流式传输用于大型文档。

XSL从类似的标记条目创建“章节”或“组”

1 个答案: