我正在使用必须是TEI-compilant的XML文件。问题在于pb(分页符)里程碑。这不是一个新问题,但现有的解决方案是如此复杂和沉重,以至于我想知道在我的案例中是否有更简单的方法。
让我们有这个XML文件部分:
<body>
<pb n="2"/><p>Some random text is put here</p><p>Another
paragraph starts here, and in the
middle of it<pb n="3"/> a page break occurred.</p><pb n="4"/
<p>the next paragraph begins
on a new page</p>
<p>But in the next paragraph, after<pb n="5"/>another page break, something else
happend : a note<note
type=glossary">the note content</note> and so everything
failing <pb n="6"/> because of this note.
我想将XML转换为HTML:
<table>
<tr><td><p>Some random text is put here</p><p>Another paragraph starts here, and in the
middle of it</p></td><td>2</td></tr>
<tr><td><p>a page break occurred.</p></td><td>3</td></tr>
<tr><td><p>the next par graph begins on a new page</p></td><td>4</td></tr>
<tr><td><p>But in the next paragraph, after</td><td>5</td></tr>
<tr><td><p>another page break, something else happend : a note
<note type=glossary">the note content</note> and so everything's failing</td><td>6</td></tr>
<tr><td>because of this note.</td></tr>
在我看来,应该可以通过for-each-groups实现非常简单。所以,基本上,我正在尝试这样的事情:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="/body">
<table>
<xsl:for-each-group select="descendant::*" group-starting-with="pb">
<tr><td><xsl:value-of select="current-group()"/>
</td><td><xsl:value-of select="current-group()[1]/@n"/>
</td></tr>
</xsl:for-each-group>
</table>
</xsl:template>
</xsl:stylesheet>
显然,它不起作用......结果是:
<table><tr><td> Some random text is put here Another paragraph starts here, and in the
middle of it a page break occurred.</td><td>2</td></tr>
<tr><td/><td>3</td></tr><tr><td>
the next paragraph begins on
a new page</td><td>4</td></tr></table>
我是否走错了方向?
非常感谢您的帮助! 克里斯托夫
答案 0 :(得分:1)
我认为你和每个小组都没有错误的方向。有时(对我来说)对输入文档进行一些“预处理”并将其保存到变量中以便进一步转换是有用的。
我有这个xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" exclude-result-prefixes="fn xs fo">
<xsl:output method="html" />
<!-- Make some "preprocess" - just to splip everything containing pb -->
<xsl:variable name="preprocess">
<!-- Only root element shouldn't be splitted -->
<xsl:element name="{/node()[1]/name()}">
<xsl:apply-templates select="/node()[1]/node() | @*" mode="preprocess"/>
</xsl:element>
</xsl:variable>
<xsl:template match="node() | @*" mode="preprocess">
<xsl:copy>
<xsl:apply-templates select="node() | @*" mode="preprocess"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[pb]" mode="preprocess">
<!-- I don't know if pb could be in another element than p - so do it more generic -->
<xsl:variable name="nodeName" select="name()" />
<xsl:element name="{$nodeName}">
<!-- Have to account there could be more pb elements - working with 1st of them -->
<xsl:apply-templates select="pb[1]/preceding-sibling::node()" mode="preprocess" />
</xsl:element>
<xsl:copy-of select="pb[1]" />
<!-- I have to continue with the rest of element - I store it into another variable
an encapsulate it with the element of the same name. Then it is processing
in standard way. -->
<xsl:variable name="restOfElement">
<xsl:element name="{$nodeName}">
<xsl:copy-of select="pb[1]/following-sibling::node()" />
</xsl:element>
</xsl:variable>
<xsl:apply-templates select="$restOfElement" mode="preprocess" />
</xsl:template>
<!-- Apply for-each-group on preprocessed value -->
<xsl:template match="/">
<html>
<head>
<title></title>
</head>
<body>
<table>
<xsl:for-each-group select="$preprocess/body/descendant::*" group-starting-with="pb">
<tr>
<td>
<xsl:copy-of select="current-group()[position() > 1]" />
</td>
<td><xsl:value-of select="current-grouping-key()/@n"/></td>
</tr>
</xsl:for-each-group>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
在第一步中,我拆分包含<pb>
的所有元素。然后该变量如下所示
<body>
<pb n="2"/>
<p>Some random text is put here</p>
<p>Another
paragraph starts here, and in the
middle of it</p>
<pb n="3"/>
<p> a page break occurred.</p>
<pb n="4"/>
<p>the next paragraph begins
on a new page</p>
<p>But in the next paragraph, after</p>
<pb n="5"/>
<p>another page break, something else
happend : a note<note type="glossary">the note content</note> and so everything
failing </p>
<pb n="6"/>
<p> because of this note</p>
</body>
在此我申请你的每组声明。它产生了以下输出
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<table>
<tr>
<td>
<p>Some random text is put here</p>
<p>Another
paragraph starts here, and in the
middle of it</p>
</td>
<td>2</td>
</tr>
<tr>
<td>
<p> a page break occurred.</p>
</td>
<td>3</td>
</tr>
<tr>
<td>
<p>the next paragraph begins
on a new page</p>
<p>But in the next paragraph, after</p>
</td>
<td>4</td>
</tr>
<tr>
<td>
<p>another page break, something else
happend : a note
<note type="glossary">the note content</note> and so everything
failing </p>
<note type="glossary">the note content</note>
</td>
<td>5</td>
</tr>
<tr>
<td>
<p> because of this note</p>
</td>
<td>6</td>
</tr>
</table>
</body>
</html>