我正在尝试使用XSLT将HTML文档转换为纯文本文档。但是,我对XSLT很新,我无法理解为什么我的转换输出与我想要的输出不同。
我输入的HTML文档:
<html>
<body>
<h1>Heading 1</h1>
<p class="first">First paragraph.</p>
<p class="para">Regular paragraph 1.</p>
<p class="para">Regular paragraph 2.</p>
<p class="para">Regular paragraph 3.</p>
<p class="last">Last paragraph.</p>
<h2 class="someclass">Heading 2</h2>
<p class="first">First paragraph 2.</p>
<p class="para">Regular paragraph 4.</p>
<p class="para">Regular paragraph 5.</p>
<p class="para">Regular paragraph 6.</p>
</body>
</html>
我想要的输出(纯文本):
Heading (h1): Heading 1
Para (first): First paragraph.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (last): Last paragraph.
Heading (someclass): Heading 2
Para (first): First paragraph 2.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
我的XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//p[@class='first']">
Para (first): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//p[@class='para']">
Para (regular): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//p[@class='last']">
Para (last): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//h1">
Heading (h1): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//h2[@class='someclass']">
Heading (someclass): <xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
将上述XSLT应用于输入HTML文档的结果:
Para (first): First paragraph.
Para (first): First paragraph 2.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
Para (last): Last paragraph.
Heading (h1): Heading 1
Heading (someclass): Heading 2
我想要做的是将HTML文档中的标签内容放入纯文本中,以便HTML内容文档显示。这种转换的作用是将所有元素匹配相同的XPATH。
我怀疑该解决方案正在使用apply-templates元素,但是我不明白它是如何工作的,因此在上面的示例中使用它时遇到了麻烦。
答案 0 :(得分:1)
这种转换完全按照你的说法进行 - 首先处理所有p[@class='first']
元素,然后处理所有p[@class='para']
等等。相反,你应该为每个元素定义单独的模板是正确的。不同的案例和使用apply-templates
将要处理的要素问题与每个要处理的问题分开。
<xsl:template match="/">
<!-- process all the child elements of body in document order -->
<xsl:apply-templates select="html/body/*" />
</xsl:template>
<!-- if the element we're processing is a <p class="first"> ... -->
<xsl:template match="p[@class='first']">
Para (first): <xsl:value-of select="."/>
</xsl:template>
<!-- etc. etc. -->
<xsl:template match="p[@class='para']">
Para (regular): <xsl:value-of select="."/>
</xsl:template>
<xsl:template match="p[@class='last']">
Para (last): <xsl:value-of select="."/>
</xsl:template>
<xsl:template match="h1">
Heading (h1): <xsl:value-of select="."/>
</xsl:template>
<xsl:template match="h2[@class='someclass']">
Heading (someclass): <xsl:value-of select="."/>
</xsl:template>