我有一个大型XML语料库文档,其结构大致如下所示:
<corpus>
<document n="001">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n=002">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
<document n="003">
<front>
<title>foo title</title>
<group n="foo_group_A"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n="004">
<front>
<title>foo title</title>
<group n="foo_group_B"/>
<front>
<body>
<seg n="1">some text with markups</seg>
</body>
</document>
<document n="005">
<front>
<title>foo title</title>
<group n="foo_group_B"/>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
[...]
</corpus>
我正在使用XSL 3.0将此XML文件预处理为不同格式的XML
在最终输出到PDF之前。作为转换的一部分,我想在<document>
元素中收集并“包装”一个反映<chapter>
值的新front/group/@n
元素。新的语料库如下所示,其中group/@n
值为新chapter
下的分组提供了逻辑:
<corpus>
<chapter n="foo_group_A">
<document n="001">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
<document n=002">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
<document n="003">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
<seg n="3">some text with markups</seg>
</body>
</document>
</chapter>
<chapter n="foo_group_B">
<document n="004">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
</body>
</document>
<document n="005">
<front>
<title>foo title</title>
<front>
<body>
<seg n="1">some text with markups</seg>
<seg n="2">some text with markups</seg>
</body>
</document>
</chapter>
[...]
</corpus>
该文件已经预先排序foo_group_A,foo_group_B等,因此无需额外排序。它只需要创建一个新元素<chapter>
来包含相关文档。我已经用xsl:for-each
尝试了这个,但我想我错过了某种“汇总”或“集合”的群体,可以通过这些群体进行迭代。
非常感谢提前。
答案 0 :(得分:3)
如果您使用XSLT 3并希望对项目进行分组,那么您当然不会使用xsl:for-each
,而是使用xsl:for-each-group
,例如。
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="corpus">
<xsl:copy>
<xsl:for-each-group select="document" group-by="front/group/@n">
<chapter n="{current-grouping-key()}">
<xsl:apply-templates select="current-group()"/>
</chapter>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="front/group"/>
</xsl:stylesheet>
http://xsltfiddle.liberty-development.net/nbUY4ki
如果document
已经按分组键front/group/@n
排序,那么使用xsl:for-each-group select="document" group-adjacent="front/group/@n"
代替上面的group-by
就足够了,这样就更容易了通过将streamable="yes"
添加到xsl:mode
声明并使用xsl:for-each-group select="copy-of(document)" group-adjacent="front/group/@n"
进行分组,将流式传输用于大型文档。