我有一个结构松散的XHTML数据,我需要将其转换为更好的结构化XML。
以下是示例:
<tbody>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
<td>Green</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td>Red</td>
<td>Round shaped</td>
<td>Bitter</td>
</tr>
<tr>
<td>Pink</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
<td>Red</td>
<td>Heart shaped</td>
<td>Super tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
<td>Yellow</td>
<td>Smile shaped</td>
<td>Fairly tasty</td>
</tr>
<tr>
<td>Brown</td>
<td>Smile shaped</td>
<td>Too sweet</td>
</tr>
我正在努力实现以下结构:
<data>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>
首先,我需要从 tbody / tr / td / img [1] / @src 中提取水果类型,其次来自 tbody / tr / td / img [2]的国家/地区] / @ alt 属性,最后是 tbody / tr / td 本身的成绩。
接下来,我需要填充每个类别下的所有条目,同时包含这些值(如上所示)。
但是......正如你所看到的,我给出的数据结构非常松散。类别只是 td ,然后是该类别中的所有项目。更糟糕的是,在我的数据集中,每个类别下的项目数量在1到100之间变化......
我尝试了一些方法,但似乎无法得到它。任何帮助是极大的赞赏。我知道XSLT 2.0引入了xsl:for-each-group,但我仅限于XSLT 1.0。
答案 0 :(得分:3)
在这种情况下,您实际上并没有对元素进行分组。它更像是取消组合它们。
执行此操作的一种方法是使用 xsl:key 查找每个详细信息行的“标题”行。
<xsl:key name="fruity"
match="tr[not(td[@class='header'])]"
use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>
即对于每个细节行,获取最前一个标题行。
接下来,您可以匹配所有标题行,如下所示:
<xsl:apply-templates select="tr/td[@class='header']"/>
在匹配的模板中,您可以提取类型,国家和等级。然后,要获取关联的详细信息行,只需查看父行的键:
<xsl:apply-templates select="key('fruity', generate-id(..))">
这是整体XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="fruity"
match="tr[not(td[@class='header'])]"
use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>
<xsl:template match="/tbody">
<data>
<!-- Match header rows -->
<xsl:apply-templates select="tr/td[@class='header']"/>
</data>
</xsl:template>
<xsl:template match="td">
<!-- Match associated detail rows -->
<xsl:apply-templates select="key('fruity', generate-id(..))">
<!-- Extract relevant parameters from the td cell -->
<xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/>
<xsl:with-param name="country" select="img[2]/@alt"/>
<xsl:with-param name="rank" select="normalize-space(text())"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="tr">
<xsl:param name="type"/>
<xsl:param name="country"/>
<xsl:param name="rank"/>
<entry>
<type>
<xsl:value-of select="$type"/>
</type>
<country>
<xsl:value-of select="$country"/>
</country>
<rank>
<xsl:value-of select="$rank"/>
</rank>
<color>
<xsl:value-of select="td[1]"/>
</color>
<shape>
<xsl:value-of select="td[2]"/>
</shape>
<taste>
<xsl:value-of select="td[3]"/>
</taste>
</entry>
</xsl:template>
</xsl:stylesheet>
应用于输入文档时,会生成以下输出:
<data>
<entry>
<type>apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super tasty</taste>
</entry>
<entry>
<type>bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>