XSL - 从文本文件创建格式良好的xml

时间:2012-09-04 08:54:49

标签: xml xslt xslt-2.0

我有一个管道分隔的文本文件,如下所示,我需要使用xsl将其转换为格式良好的xml结构(如下所示)。下面的xsl是我(最新)尝试解决这个问题 - 但是我似乎无法找到一种方法来将001级元素封装在001级,即在逐行迭代文件时保持父子关系。有人可以帮忙吗?

管道分隔文件 - 输入

001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG

XML文件 - 所需的输出

<root>
   <level001>
            <elem name="field1">001</elem>
            <elem name="field2">XXX</elem>
            <elem name="field3">YYY</elem>
            <level002>
                           <elem name="field1">002</elem>
                           <elem name="field2">AAA</elem>
                           <elem name="field3">BBB</elem>
             </level002>
             <level002>
                        <elem name="field1">002</elem>
                        <elem name="field2">CCC</elem>
                        <elem name="field3">DD</elem>
              </level002>
    </level001>
    <level001>
                 <elem name="field1">001</elem>
                 <elem name="field2">XXX</elem>
                <elem name="field3">YYY</elem>
                <level002>
                         <elem name="field1">002</elem>
                         <elem name="field2">HHH</elem>
                         <elem name="field3">GG</elem>
               </level002>
    </level001>
</root>

当前XSL

<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col> 
</xsl:variable>


 <xsl:template match="/" name="main">
<xsl:choose>
    <xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
       <xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
       <xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
       <root>
       <xsl:for-each select="$lines[position() &gt; 0]">
        <xsl:if test="translate(., '&#160; &#9;&#10;&#13;',  '') != ''">
            <level001>
            <xsl:variable name="line" select="." />
            <xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>    
            <xsl:choose>
                <xsl:when test="$columns[1]='001'">
                    <xsl:for-each select="$Cols/col">
                        <xsl:variable name="column" select="number(substring-after(.,','))"/>
                        <elem name="{substring-before(.,',')}">
                            <!-- trims the whitespace from the beginning and the ending of the value -->
                            <xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
                        </elem>
                    </xsl:for-each>
                </xsl:when>
                <xsl:when test="$columns[1]='002'">
                    <level002>
                    <xsl:for-each select="$Cols/col">
                        <xsl:variable name="column" select="number(substring-after(.,','))"/>
                        <elem name="{substring-before(.,',')}">
                            <!-- trims the whitespace from the beginning and the ending of the value -->
                            <xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
                        </elem>
                    </xsl:for-each>
                    </level002>
                </xsl:when>
            </xsl:choose>                               
            </level001>
        </xsl:if>
       </xsl:for-each>
       </root>
    </xsl:when>         
</xsl:choose>

3 个答案:

答案 0 :(得分:1)

您可以在这里找到解决方案基本相同的问题:

http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml

核心是一个递归分组模板:

<xsl:template name="process-level">
  <xsl:param name="population" required="yes" as="element()*"/>
  <xsl:param name="level" required="yes" as="xs:integer"/>
  <xsl:for-each-group select="$population" 
       group-starting-with="*[xs:integer(@level) eq $level]">
    <xsl:element name="{@tag}">
      <xsl:copy-of select="@ID[string(.)], @REF[string(.)]"/>
      <xsl:value-of select="normalize-space(@text)"/>
      <xsl:call-template name="process-level">
        <xsl:with-param name="population" 
                        select="current-group()[position() != 1]"/>
        <xsl:with-param name="level" 
                        select="$level + 1"/>
      </xsl:call-template>
    </xsl:element>
  </xsl:for-each-group>
</xsl:template>

答案 1 :(得分:1)

我首先将平面文本转换为平面XML结构,然后将其与for-each-group group-starting-with分组,如下面的代码示例所示:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="mf xs"
  version="2.0">

<xsl:param name="text-url" as="xs:string" select="'test2012090401.txt'"/>
<xsl:param name="sep" as="xs:string" select="'\|'"/>
<xsl:param name="field" as="xs:string" select="'field'"/>

<xsl:output indent="yes"/>

<xsl:function name="mf:group" as="node()*">
  <xsl:param name="nodes" as="node()*"/>
  <xsl:param name="level" as="xs:integer"/>
  <xsl:for-each-group select="$nodes" group-starting-with="line[xs:integer(elem[1]) eq $level]">
    <xsl:element name="level{*[1]}">
      <xsl:copy-of select="*"/>
      <xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
    </xsl:element>
  </xsl:for-each-group>
</xsl:function>

<xsl:template name="main">
  <xsl:variable name="flat">
    <xsl:for-each select="tokenize(unparsed-text($text-url), '\r?\n')">
      <line>
        <xsl:for-each select="tokenize(., $sep)">
          <elem name="{$field}{position()}">
            <xsl:value-of select="."/>
          </elem>
        </xsl:for-each>
      </line>
    </xsl:for-each>
  </xsl:variable>
  <root>
    <xsl:sequence select="mf:group($flat/line, 1)"/>
  </root>
</xsl:template>

</xsl:stylesheet>

当我使用java -jar saxon9he.jar -it:main -xsl:sheet.xsl将该样式表应用于Saxon 9时,我得到的结果是

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <level001>
      <elem name="field1">001</elem>
      <elem name="field2">XXX</elem>
      <elem name="field3">YYY</elem>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">AAA</elem>
         <elem name="field3">BBB</elem>
      </level002>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">CCC</elem>
         <elem name="field3">DD</elem>
      </level002>
   </level001>
   <level001>
      <elem name="field1">001</elem>
      <elem name="field2">EEF</elem>
      <elem name="field3">XXX</elem>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">HHH</elem>
         <elem name="field3">GGG</elem>
         <level/>
      </level002>
   </level001>
</root>

样式表在运行样式表时可以设置的纯文本文件中有一个名为text-url的参数。

答案 2 :(得分:0)

嗯,你正在迭代每一行,并在完成该行时已经关闭了level001标记。为什么不尝试类似(伪代码)的东西:

    每行
  • 如果line是level001
  • print <level001>
  • 获得下一个级别的索引
    • 表示此行与下一个level001行之间的每个level002
    • print <level002>
    • 打印body002
    • print </level002>
  • print </level001>