XSLT文件中的XPath语句会将下面的HTML转换为下面的XML吗?

时间:2015-03-11 12:07:44

标签: java xml xslt xpath

我想要下表数据:

<html>
<table border="1">
<tr>
<td rowspan="2">2015</td>
<td>First Event of 2015</td>
</tr>
<tr><td>Second Event of 2015</td></tr>
<tr>
<td rowspan="2">2014</td>
<td>First Event of 2014</td>
</tr>
<tr><td>Second Event of 2014</td></tr>
</table>
</html>

使用XPath转换为以下XML:

<events>
<event year="2015" name="First Event of 2015">
<event year="2015" name="Second Event of 2015">
<event year="2014" name="First Event of 2014">
<event year="2014" name="Second Event of 2014">
</events>

如何处理xpath中的rowspans以获得此输出?

为了记录,我正在使用以下Java代码来执行XSLT转换:

String xsltCode = ... // the xslt Im asking for....
File xmlInput = ... // the file with the html code above
File xmlOutput = new File("output.xml");
Transformer transformer = TransformerFactory.newInstance().newTransformer(new StreamSource(new StringReader(xsltCode)));
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
Source xmlSource = new StreamSource(xmlInput);
Result resultOutput = new StreamResult(xmlOutput);
transformer.transform(xmlSource,resultOutput);

2 个答案:

答案 0 :(得分:6)

我很高兴我们终于找到了你需要的东西。请尝试从一开始就明确您的未来问题 - 这将为您节省时间和投票。

编写与/匹配的第一个模板,并输出输出的最外层元素events。然后,编写第二个模板,该模板匹配没有td属性的所有@rowspan元素。必须从 具有td属性的前一个@rowspan元素中选择年份信息。

XSLT样式表

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
      <events>
          <xsl:apply-templates/>
      </events>
    </xsl:template>

    <xsl:template match="td[not(@rowspan)]">
        <event year="{preceding::td[@rowspan][1]}">
            <xsl:value-of select="."/>
        </event>
    </xsl:template>

    <xsl:template match="text()"/>
</xsl:transform>

XML输出

<?xml version="1.0" encoding="UTF-8"?>
<events>
   <event year="2015">First Event of 2015</event>
   <event year="2015">Second Event of 2015</event>
   <event year="2014">First Event of 2014</event>
   <event year="2014">Second Event of 2014</event>
</events>

在线试用此解决方案here

答案 1 :(得分:2)

假设给定的示例过于简单,并且实际输入也可以包含仅包含单个事件的年份,我建议:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="table">
    <events>
        <xsl:apply-templates select="tr"/>
    </events>
</xsl:template>

<xsl:template match="tr">
    <event>
        <xsl:attribute name="year">
            <xsl:value-of select="(. | preceding-sibling::tr)[count(td)=2][last()]/td[1]"/>
        </xsl:attribute>
        <xsl:value-of select="td[last()]"/>
    </event>
</xsl:template>

</xsl:stylesheet>

应用于以下测试输入

<html>
  <table border="1">
    <tr>
      <td rowspan="2">2015</td>
      <td>First Event of 2015</td>
    </tr>
    <tr>
      <td>Second Event of 2015</td>
    </tr>
    <tr>
      <td rowspan="2">2014</td>
      <td>First Event of 2014</td>
    </tr>
    <tr>
      <td>Second Event of 2014</td>
    </tr>
    <tr>
      <td>Third Event of 2014</td>
    </tr>
    <tr>
      <td>2013</td>
      <td>Only Event of 2013</td>
    </tr>
  </table>
</html>

结果将是:

<?xml version="1.0" encoding="UTF-8"?>
<events>
   <event year="2015">First Event of 2015</event>
   <event year="2015">Second Event of 2015</event>
   <event year="2014">First Event of 2014</event>
   <event year="2014">Second Event of 2014</event>
   <event year="2014">Third Event of 2014</event>
   <event year="2013">Only Event of 2013</event>
</events>