我需要将HTML文件拆分为多个HTML文件,使用h1节点作为文件的分隔符 例如:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
我想要4个输出。
frontpage.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
</body>
</html>
output1.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
</div>
</body>
</html>
output2.html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
</div>
</body>
</html>
output3.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
我会很感激解决这个问题的任何想法。
PS:我使用XSLT 2.0和Saxon 8
答案 0 :(得分:1)
请注意,Saxon 8已有几年的历史,8.9之前的版本没有实现XSLT 2.0规范,但是之前的草案还没有实现。
以下是使用Saxon 9.6测试的XSLT 2.0样式表:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output method="html" version="4.01" indent="yes"/>
<xsl:template match="/">
<xsl:for-each-group select="//h1 | //text()[not(ancestor::h1)] | //*[not(*) and not(ancestor::h1)]" group-starting-with="h1">
<xsl:variable name="copy" select="current-group()"/>
<xsl:variable name="ancestors" select="$copy/ancestor::*"/>
<xsl:variable name="filename" select="if (not(self::h1)) then 'frontpage.html' else concat('output', position() - 1, '.html')"/>
<xsl:result-document href="{$filename}">
<xsl:apply-templates select="/*">
<xsl:with-param name="copy" select="$copy"/>
<xsl:with-param name="ancestors" select="$ancestors"/>
</xsl:apply-templates>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="node()">
<xsl:param name="copy"/>
<xsl:param name="ancestors"/>
<xsl:choose>
<xsl:when test="$copy[. is current()]">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:when test="$ancestors[. is current()]">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates>
<xsl:with-param name="copy" select="$copy"/>
<xsl:with-param name="ancestors" select="$ancestors"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="head">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
应用于输入文件
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none" /></div>
</div>
<div>
<h1> Title 1 </h1><p> some blabla for title_1 </p>
<h2> Title 1.1 </h2><p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50"/>
<h1> Title 2 </h1><p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
<h1> Title 3 </h1><p> some blabla for title_3 </p>
</div>
</body>
</html>
它创建了四个输出文件
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<p><span>This is my frontpage</span></p>
<div><img src="images/frontpage.png" width="100" height="50" style="border:none"></div>
</div>
<div>
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 1 </h1>
<p> some blabla for title_1 </p>
<h2> Title 1.1 </h2>
<p> some blabla for title_1_1 </p><img src="images/title_1_1.png" width="50" height="50">
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 2 </h1>
<p> some blabla for title_2 </p>
</div>
<div>
<p> other blabla </p>
</div>
</body>
</html>
和
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title>Test</title>
<style type="text/css">body { font-family:Helvetica; font-size:9pt }}</style>
</head>
<body>
<div>
<h1> Title 3 </h1>
<p> some blabla for title_3 </p>
</div>
</body>
</html>
因此我认为样式表会根据需要拆分节点并创建正确的文件内容,您需要尝试使用空格剥离和缩进。