一个棘手的XSLT转换

时间:2011-09-23 05:57:05

标签: xslt transformation

我有一个结构松散的XHTML数据,我需要将其转换为更好的结构化XML。

以下是示例:

<tbody>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
    <td>Green</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td>Red</td>
    <td>Round shaped</td>
    <td>Bitter</td>
</tr>
<tr>
    <td>Pink</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
    <td>Red</td>
    <td>Heart shaped</td>
    <td>Super tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
    <td>Yellow</td>
    <td>Smile shaped</td>
    <td>Fairly tasty</td>
</tr>
<tr>
    <td>Brown</td>
    <td>Smile shaped</td>
    <td>Too sweet</td>
</tr>

我正在努力实现以下结构:

    <data>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Green</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Red</color>
        <shape>Round shaped</shape>
        <taste>Bitter</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Pink</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Strawberries</type>
        <country>USA</country>
        <rank>Fifth Grade</rank>
        <color>Red</color>
        <shape>Heart shaped</shape>
        <taste>Super</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Yellow</color>
        <shape>Smile shaped</shape>
        <taste>Fairly tasty</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Brown</color>
        <shape>Smile shaped</shape>
        <taste>Too sweet</taste>
    </entry>
</data>

首先,我需要从 tbody / tr / td / img [1] / @src 中提取水果类型,其次来自 tbody / tr / td / img [2]的国家/地区] / @ alt 属性,最后是 tbody / tr / td 本身的成绩。

接下来,我需要填充每个类别下的所有条目,同时包含这些值(如上所示)。

但是......正如你所看到的,我给出的数据结构非常松散。类别只是 td ,然后是该类别中的所有项目。更糟糕的是,在我的数据集中,每个类别下的项目数量在1到100之间变化......

我尝试了一些方法,但似乎无法得到它。任何帮助是极大的赞赏。我知道XSLT 2.0引入了xsl:for-each-group,但我仅限于XSLT 1.0。

1 个答案:

答案 0 :(得分:3)

在这种情况下,您实际上并没有对元素进行分组。它更像是取消组合它们。

执行此操作的一种方法是使用 xsl:key 查找每个详细信息行的“标题”行。

<xsl:key name="fruity" 
   match="tr[not(td[@class='header'])]" 
   use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

即对于每个细节行,获取最前一个标题行。

接下来,您可以匹配所有标题行,如下所示:

<xsl:apply-templates select="tr/td[@class='header']"/>

在匹配的模板中,您可以提取类型,国家和等级。然后,要获取关联的详细信息行,只需查看父行的键:

<xsl:apply-templates select="key('fruity', generate-id(..))">

这是整体XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="yes"/>

   <xsl:key name="fruity" 
      match="tr[not(td[@class='header'])]" 
      use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

   <xsl:template match="/tbody">
      <data>
         <!-- Match header rows -->
         <xsl:apply-templates select="tr/td[@class='header']"/>
      </data>
   </xsl:template>

   <xsl:template match="td">
      <!-- Match associated detail rows -->
      <xsl:apply-templates select="key('fruity', generate-id(..))">
         <!-- Extract relevant parameters from the td cell -->
         <xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/>
         <xsl:with-param name="country" select="img[2]/@alt"/>
         <xsl:with-param name="rank" select="normalize-space(text())"/>
      </xsl:apply-templates>
   </xsl:template>

   <xsl:template match="tr">
      <xsl:param name="type"/>
      <xsl:param name="country"/>
      <xsl:param name="rank"/>
      <entry>
         <type>
            <xsl:value-of select="$type"/>
         </type>
         <country>
            <xsl:value-of select="$country"/>
         </country>
         <rank>
            <xsl:value-of select="$rank"/>
         </rank>
         <color>
            <xsl:value-of select="td[1]"/>
         </color>
         <shape>
            <xsl:value-of select="td[2]"/>
         </shape>
         <taste>
            <xsl:value-of select="td[3]"/>
         </taste>
      </entry>
   </xsl:template>
</xsl:stylesheet>

应用于输入文档时,会生成以下输出:

<data>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Green</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Red</color>
      <shape>Round shaped</shape>
      <taste>Bitter</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Pink</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>strawberries</type>
      <country>USA</country>
      <rank>Fifth Grade</rank>
      <color>Red</color>
      <shape>Heart shaped</shape>
      <taste>Super tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Yellow</color>
      <shape>Smile shaped</shape>
      <taste>Fairly tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Brown</color>
      <shape>Smile shaped</shape>
      <taste>Too sweet</taste>
   </entry>
</data>