使用XSLT将HTML转换为纯文本 - apply-templates和内容顺序

时间:2014-04-24 15:12:59

标签: xml xslt apply-templates

我正在尝试使用XSLT将HTML文档转换为纯文本文档。但是,我对XSLT很新,我无法理解为什么我的转换输出与我想要的输出不同。

我输入的HTML文档:

<html>
<body>
  <h1>Heading 1</h1>
  <p class="first">First paragraph.</p>
  <p class="para">Regular paragraph 1.</p>
  <p class="para">Regular paragraph 2.</p>
  <p class="para">Regular paragraph 3.</p>
  <p class="last">Last paragraph.</p>
  <h2 class="someclass">Heading 2</h2>
  <p class="first">First paragraph 2.</p>
  <p class="para">Regular paragraph 4.</p>
  <p class="para">Regular paragraph 5.</p>
  <p class="para">Regular paragraph 6.</p>
</body>
</html>

我想要的输出(纯文本):

Heading (h1): Heading 1
Para (first): First paragraph.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (last): Last paragraph.
Heading (someclass): Heading 2
Para (first): First paragraph 2.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.

我的XSLT:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">

        <xsl:for-each select="//p[@class='first']">
            Para (first): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//p[@class='para']">
            Para (regular): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//p[@class='last']">
            Para (last): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//h1">
            Heading (h1): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//h2[@class='someclass']">
            Heading (someclass): <xsl:value-of select="."/>
        </xsl:for-each>

    </xsl:template>
</xsl:stylesheet>

将上述XSLT应用于输入HTML文档的结果:

Para (first): First paragraph.
Para (first): First paragraph 2.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
Para (last): Last paragraph.
Heading (h1): Heading 1
Heading (someclass): Heading 2

我想要做的是将HTML文档中的标签内容放入纯文本中,以便HTML内容文档显示。这种转换的作用是将所有元素匹配相同的XPATH。

我怀疑该解决方案正在使用apply-templates元素,但是我不明白它是如何工作的,因此在上面的示例中使用它时遇到了麻烦。

1 个答案:

答案 0 :(得分:1)

这种转换完全按照你的说法进行 - 首先处理所有p[@class='first']元素,然后处理所有p[@class='para']等等。相反,你应该为每个元素定义单独的模板是正确的。不同的案例和使用apply-templates将要处理的要素问题与每个要处理的问题分开。

<xsl:template match="/">
  <!-- process all the child elements of body in document order -->
  <xsl:apply-templates select="html/body/*" />
</xsl:template>

<!-- if the element we're processing is a <p class="first"> ... -->
<xsl:template match="p[@class='first']">
    Para (first): <xsl:value-of select="."/>
</xsl:template>

<!-- etc. etc. -->
<xsl:template match="p[@class='para']">
    Para (regular): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="p[@class='last']">
    Para (last): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="h1">
    Heading (h1): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="h2[@class='someclass']">
    Heading (someclass): <xsl:value-of select="."/>
</xsl:template>