XSL从节点到节点拆分文件

时间:2015-01-22 16:54:13

标签: html xslt saxon

我需要将HTML文件拆分为多个HTML文件,使用h1节点作为文件的分隔符 例如:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <p><span>This is my frontpage</span></p>
            <div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
        </div>
        <div>
            <h1> Title 1 </h1><p> some blabla for title_1 </p>
            <h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
            <h1> Title 2 </h1><p> some blabla for title_2 </p>
        </div>
        <div>
            <p> other blabla </p>
            <h1> Title 3 </h1><p> some blabla for title_3 </p>
        </div>
    </body>
</html>

我想要4个输出。

frontpage.html:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <p><span>This is my frontpage</span></p>
            <div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
        </div>
    </body>
</html>

output1.html:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <h1> Title 1 </h1><p> some blabla for title_1 </p>
            <h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
        </div>
    </body>
</html>

output2.html:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <h1> Title 2 </h1><p> some blabla for title_2 </p>
        </div>
        <div>
            <p> other blabla </p>
        </div>
    </body>
</html>

output3.html

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <h1> Title 3 </h1><p> some blabla for title_3 </p>
        </div>
    </body>
</html>

我会很感激解决这个问题的任何想法。

PS:我使用XSLT 2.0和Saxon 8

1 个答案:

答案 0 :(得分:1)

请注意,Saxon 8已有几年的历史,8.9之前的版本没有实现XSLT 2.0规范,但是之前的草案还没有实现。

以下是使用Saxon 9.6测试的XSLT 2.0样式表:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:output method="html" version="4.01" indent="yes"/>

<xsl:template match="/">
  <xsl:for-each-group select="//h1 | //text()[not(ancestor::h1)] | //*[not(*) and not(ancestor::h1)]" group-starting-with="h1">
    <xsl:variable name="copy" select="current-group()"/>
    <xsl:variable name="ancestors" select="$copy/ancestor::*"/>
    <xsl:variable name="filename" select="if (not(self::h1)) then 'frontpage.html' else concat('output', position() - 1, '.html')"/>
    <xsl:result-document href="{$filename}">
      <xsl:apply-templates select="/*">
        <xsl:with-param name="copy" select="$copy"/>
        <xsl:with-param name="ancestors" select="$ancestors"/>
      </xsl:apply-templates>
    </xsl:result-document>
  </xsl:for-each-group>
</xsl:template>

<xsl:template match="node()">
  <xsl:param name="copy"/>
  <xsl:param name="ancestors"/>
  <xsl:choose>
    <xsl:when test="$copy[. is current()]">
      <xsl:copy-of select="."/>
    </xsl:when>
    <xsl:when test="$ancestors[. is current()]">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates>
          <xsl:with-param name="copy" select="$copy"/>
          <xsl:with-param name="ancestors" select="$ancestors"/>
        </xsl:apply-templates>
      </xsl:copy>
    </xsl:when>
  </xsl:choose>
</xsl:template>

<xsl:template match="head">
  <xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>

应用于输入文件

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="Content-Style-Type" content="text/css" />
        <title>Test</title>
        <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
    </head>
    <body>
        <div>
            <p><span>This is my frontpage</span></p>
            <div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
        </div>
        <div>
            <h1> Title 1 </h1><p> some blabla for title_1 </p>
            <h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
            <h1> Title 2 </h1><p> some blabla for title_2 </p>
        </div>
        <div>
            <p> other blabla </p>
            <h1> Title 3 </h1><p> some blabla for title_3 </p>
        </div>
    </body>
</html>

它创建了四个输出文件

<html>

   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>

   <body>

      <div>

         <p><span>This is my frontpage</span></p>

         <div><img src="images/frontpage.png" width="100" height="50" style="border:none"></div>

      </div>

      <div>

      </div>
   </body>
</html>

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>
   <body>
      <div>
         <h1> Title 1 </h1>
         <p> some blabla for title_1 </p>

         <h2> Title 1.1 </h2>
         <p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50">

      </div>
   </body>
</html>

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>
   <body>
      <div>
         <h1> Title 2 </h1>
         <p> some blabla for title_2 </p>

      </div>

      <div>

         <p> other blabla </p>

      </div>
   </body>
</html>

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


      <meta http-equiv="Content-Style-Type" content="text/css">

      <title>Test</title>
      <style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
      </head>
   <body>
      <div>
         <h1> Title 3 </h1>
         <p> some blabla for title_3 </p>

      </div>

   </body>

</html>

因此我认为样式表会根据需要拆分节点并创建正确的文件内容,您需要尝试使用空格剥离和缩进。