我有一个管道分隔的文本文件,如下所示,我需要使用xsl将其转换为格式良好的xml结构(如下所示)。下面的xsl是我(最新)尝试解决这个问题 - 但是我似乎无法找到一种方法来将001级元素封装在001级,即在逐行迭代文件时保持父子关系。有人可以帮忙吗?
管道分隔文件 - 输入
001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG
XML文件 - 所需的输出
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GG</elem>
</level002>
</level001>
</root>
当前XSL
<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col>
</xsl:variable>
<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
<xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
<root>
<xsl:for-each select="$lines[position() > 0]">
<xsl:if test="translate(., '  	 ', '') != ''">
<level001>
<xsl:variable name="line" select="." />
<xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>
<xsl:choose>
<xsl:when test="$columns[1]='001'">
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</xsl:when>
<xsl:when test="$columns[1]='002'">
<level002>
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</level002>
</xsl:when>
</xsl:choose>
</level001>
</xsl:if>
</xsl:for-each>
</root>
</xsl:when>
</xsl:choose>
答案 0 :(得分:1)
您可以在这里找到解决方案基本相同的问题:
http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml
核心是一个递归分组模板:
<xsl:template name="process-level">
<xsl:param name="population" required="yes" as="element()*"/>
<xsl:param name="level" required="yes" as="xs:integer"/>
<xsl:for-each-group select="$population"
group-starting-with="*[xs:integer(@level) eq $level]">
<xsl:element name="{@tag}">
<xsl:copy-of select="@ID[string(.)], @REF[string(.)]"/>
<xsl:value-of select="normalize-space(@text)"/>
<xsl:call-template name="process-level">
<xsl:with-param name="population"
select="current-group()[position() != 1]"/>
<xsl:with-param name="level"
select="$level + 1"/>
</xsl:call-template>
</xsl:element>
</xsl:for-each-group>
</xsl:template>
答案 1 :(得分:1)
我首先将平面文本转换为平面XML结构,然后将其与for-each-group group-starting-with
分组,如下面的代码示例所示:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="mf xs"
version="2.0">
<xsl:param name="text-url" as="xs:string" select="'test2012090401.txt'"/>
<xsl:param name="sep" as="xs:string" select="'\|'"/>
<xsl:param name="field" as="xs:string" select="'field'"/>
<xsl:output indent="yes"/>
<xsl:function name="mf:group" as="node()*">
<xsl:param name="nodes" as="node()*"/>
<xsl:param name="level" as="xs:integer"/>
<xsl:for-each-group select="$nodes" group-starting-with="line[xs:integer(elem[1]) eq $level]">
<xsl:element name="level{*[1]}">
<xsl:copy-of select="*"/>
<xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
</xsl:element>
</xsl:for-each-group>
</xsl:function>
<xsl:template name="main">
<xsl:variable name="flat">
<xsl:for-each select="tokenize(unparsed-text($text-url), '\r?\n')">
<line>
<xsl:for-each select="tokenize(., $sep)">
<elem name="{$field}{position()}">
<xsl:value-of select="."/>
</elem>
</xsl:for-each>
</line>
</xsl:for-each>
</xsl:variable>
<root>
<xsl:sequence select="mf:group($flat/line, 1)"/>
</root>
</xsl:template>
</xsl:stylesheet>
当我使用java -jar saxon9he.jar -it:main -xsl:sheet.xsl
将该样式表应用于Saxon 9时,我得到的结果是
<?xml version="1.0" encoding="UTF-8"?>
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">EEF</elem>
<elem name="field3">XXX</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GGG</elem>
<level/>
</level002>
</level001>
</root>
样式表在运行样式表时可以设置的纯文本文件中有一个名为text-url
的参数。
答案 2 :(得分:0)
嗯,你正在迭代每一行,并在完成该行时已经关闭了level001
标记。为什么不尝试类似(伪代码)的东西:
<level001>
<level002>
</level002>
</level001>