一种使用XSL将大型XML文件拆分为较小的xml文件的方法

时间:2010-03-11 10:36:46

标签: xml xslt xalan filesplitting

我收到一个包含电视广播列表的巨大XML文件。而且我必须将它分成包含所有广播的小文件,仅限一天。我设法达到了这个目的,但xml标题有两个问题,并且有多个节点在那里。

XML的结构如下:

<?xml version="1.0" encoding="UTF-8"?>
<broadcasts>
    <broadcast>
    <id>4637445812</id>
    <week>39</week>
    <date>2009-09-22</date>
    <time>21:45:00:00</time>
        ... (some more)
    </broadcast>
    ... (long list of broadcast nodes)
</broadcasts>

我的XSL看起来像这样:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:redirect="http://xml.apache.org/xalan/redirect"
        extension-element-prefixes="redirect"
        version="1.0">
    <!-- mark the CDATA escaped tags -->
    <xsl:output method="xml" cdata-section-elements="title text"
        indent="yes" omit-xml-declaration="no" />

    <xsl:template match="broadcasts">
        <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="broadcast">
    <!-- Build filename PRG_YYYYMMDD.xml -->
    <xsl:variable name="filename" select="concat(substring(date,1,4),substring(date,6,2))"/>
    <xsl:variable name="filename" select="concat($filename,substring(date,9,2))" />
    <xsl:variable name="filename" select="concat($filename,'.xml')" />
    <redirect:write select="concat('PRG_',$filename)" append="true">    

        <schedule>  
        <broadcast program="TEST">
            <!-- format timestamp in specific way -->
            <xsl:variable name="tmstmp" select="concat(substring(date,9,2),'/')"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,6,2))"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,'/')"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,1,4))"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,' ')"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,substring(time,1,5))"/>

            <timestamp><xsl:value-of select="$tmstmp"/></timestamp>
            <xsl:copy-of select="title"/>
            <text><xsl:value-of select="subtitle"/></text>

            <xsl:variable name="newVps" select="concat(substring(vps,1,2),substring(vps,4,2))"/>
            <xsl:variable name="newVps" select="concat($newVps,substring(vps,7,2))"/>
            <xsl:variable name="newVps" select="concat($newVps,substring(vps,10,2))"/>
            <vps><xsl:value-of select="$newVps"/></vps>                    
            <nextday>false</nextday>               
        </broadcast>      
        </schedule>
    </redirect:write>
    </xsl:template> 
</xsl:stylesheet>

我的输出XML是这样的:

PRG_20090512.xml:

<?xml version="1.0" encoding="UTF-8"?>
  <schedule>
    <broadcast program="TEST">
      <timestamp>01/03/2010 06:00</timestamp>
      <title><![CDATA[TELEKOLLEG  Geschichte ]]></title>
      <text><![CDATA[Giganten in Fernost]]></text>
      <vps>06000000</vps>
      <nextday>false</nextday>
    </broadcast>
  </schedule>
<?xml version="1.0" encoding="UTF-8"?> <!-- don't want this -->
  <schedule>  <!-- don't want this -->
    <broadcast program="TEST">
      <timestamp>01/03/2010 06:30</timestamp>
      <title><![CDATA[Die chemische Bindung]]></title>
      <text/>
      <vps>06300000</vps>
      <nextday>false</nextday>
    </broadcast>
  </schedule>
<?xml version="1.0" encoding="UTF-8"?>
...and so on

我可以在输出声明中输入omit-xml-declaration =“yes”,但是我没有任何xml标头。我试图检查标签是否已经在输出中,但未能在输出中选择节点......

这就是我的尝试:

<xsl:choose>
  <xsl:when test="count(schedule) = 0"> <!-- schedule needed -->   
    <schedule>
      <broadcast>
    ...
  <xsl:otherwise> <!-- no schedule needed -->
    <broadcast>
    ...

感谢您的帮助,因为我不知道如何处理。 ( YeTI

2 个答案:

答案 0 :(得分:1)

一次写一个文件,包含该日期的所有广播。

这成为按日期对输入元素进行分组的问题。由于Xalan是XSLT 1.0,所以你可以使用密钥。

我们定义按日期分组广播的密钥。我们选择每个广播,这是它的第一个日期。然后使用键功能选择同一日期的所有广播。

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:redirect="http://xml.apache.org/xalan/redirect"
                extension-element-prefixes="redirect"
                version="1.0">

    <!-- mark the CDATA escaped tags --> 
    <xsl:output method="xml" cdata-section-elements="title text" indent="yes" omit-xml-declaration="no" />

    <xsl:key name="date" match="broadcast" use="date" />

    <xsl:template match="broadcasts">
        <xsl:apply-templates select="broadcast[generate-id(.)=generate-id(key('date',date)[1])]"/>
    </xsl:template>

    <xsl:template match="broadcast">
        <!-- Build filename PRG_YYYYMMDD.xml -->
        <xsl:variable name="filename" select="concat(substring(date,1,4),substring(date,6,2))"/>
        <xsl:variable name="filename" select="concat($filename,substring(date,9,2))" />
        <xsl:variable name="filename" select="concat($filename,'.xml')" />

        <redirect:write select="concat('PRG_',$filename)" append="true">        

            <schedule>
                <xsl:apply-templates select="key('date',date)" mode="broadcast" />
            </schedule>

        </redirect:write>

    </xsl:template>

    <xsl:template match="broadcast" mode="broadcast">
        <broadcast program="TEST">
            <!-- format timestamp in specific way -->
            <xsl:variable name="tmstmp" select="concat(substring(date,9,2),'/')"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,6,2))"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,'/')"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,1,4))"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,' ')"/>
            <xsl:variable name="tmstmp" select="concat($tmstmp,substring(time,1,5))"/>

            <timestamp><xsl:value-of select="$tmstmp"/></timestamp>
            <xsl:copy-of select="title"/>
            <text><xsl:value-of select="subtitle"/></text>

            <xsl:variable name="newVps" select="concat(substring(vps,1,2),substring(vps,4,2))"/>
            <xsl:variable name="newVps" select="concat($newVps,substring(vps,7,2))"/>
            <xsl:variable name="newVps" select="concat($newVps,substring(vps,10,2))"/>
            <vps><xsl:value-of select="$newVps"/></vps>                                     
            <nextday>false</nextday>                             
        </broadcast>
    </xsl:template>

</xsl:stylesheet>

答案 1 :(得分:0)

使用唯一的父级包裹您的计划元素,看看是否会导致问题消失。

我不熟悉这个特殊问题,但我的猜测是,这是因为您尝试使用多个顶级元素生成XML文档。每个XML文档必须只有一个顶级元素(如果你问我,这是一个愚蠢的要求,例如它使XML不适用于日志文件,但就是这样)。