Question

我发现的XSLT处理的所有教程和示例似乎都假设您的目标与源的格式/结构明显不同，并且您事先知道源的结构。我正在努力寻找如何对HTML文档执行简单的“就地”修改，而不了解其现有结构的任何其他内容。

有人可以向我展示一个明确的例子，给定一个任意未知的HTML源代码：

1.) delete the classname 'foo' from all divs
2.) delete a node if its empty (ie <p></p>)
3.) delete a <p> node if its first child is <br>
4.) add newattr="newvalue" to all H1
5.) replace 'heading' in text nodes with 'title'
6.) wrap all <u> tags in <b> tags (ie, <u>foo</u> -> <b><u>foo</u></b>)
7.) output the transformed document without changing anything else

以上示例是我希望实现的主要转换类型。了解如何完成上述工作将大大有助于我构建更复杂的转换。

为了帮助澄清/测试这里的示例是一个示例源和输出，但是我必须重申我想处理任意样本而不为每个源重写XSLT：

<!doctype html>
<html>
<body>
  <h1>heading</h1>
  <p></p>
  <p><br>line</p>
  <div class="foo bar"><u>baz</u></div>
  <p>untouched</p>
</body>
</html>

输出：

<!doctype html>
<html>
<body>
  <h1 newattr="newvalue">title</h1>
  <div class="bar"><b><u>baz</u></b></div>
  <p>untouched</p>
</body>
</html>

Answer 1

1。）从所有div中删除类名'foo'

<xsl:template match="div[contains(concat(' ', @class, ' '), ' foo ')]">
  <xsl:copy>
    <xsl:attribute name="class">
      <xsl:variable name="s" select="substring-before(concat(' ', @class, ' '), ' foo ')" />
      <xsl:variable name="e" select="substring-after(concat(' ', @class, ' '), ' foo ')" />
      <xsl:value-of select="normalize-space(concat($s, ' ', $e))" />
    </xsl:attribute>
    <xsl:apply-templates select="node() | @*[not(self::@class)]" />
  </xsl:copy>
</xsl:template>

2。）如果节点为空（即）

，则删除该节点

<xsl:template match="*[normalize-space() = '']" />

3。）删除

节点，如果第一个孩子是 

<xsl:template match="p[*[1]/self::br]" />

4.）将newattr="newvalue"添加到所有<h1>

<xsl:template match="h1[not(@newattr)]">
  <xsl:copy>
    <xsl:attribute name="newattr">
      <xsl:value-of select="'newvalue'" />
    </xsl:attribute>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

5.。）用'title'替换文本节点中的'heading'

<!-- This replaces the first occurrence of 'heading', case-sensitively.
     More generic search-and-replace templates are plenty, here on SO as well as 
     elsewhere on the 'net. -->
<xsl:template match="text()[contains(concat(' ', ., ' '), ' heading ')]">
  <xsl:variable name="s" select="substring-before(concat(' ', ., ' '), ' heading ')" />
  <xsl:variable name="e" select="substring-after(concat(' ', ., ' '), ' title ')" />
  <xsl:value-of select="normalize-space(concat($s, ' ', $e))" />
</xsl:template>

6。）将所有标记包装在标记中（即foo - ＆gt; foo）

<xsl:template match="u[not(parent::*/self::b)]">
  <b>
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </b>
</xsl:template>

7。）输出转换后的文档而不改变任何其他内容

<!-- the identity template copies everything that is not handled by 
     any of the more specific templates above -->
<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

当多个模板可以匹配同一节点时，模板顺序和特异性决定哪个模板“获胜”。

更具体的意思是：“在多个竞争模板中，具有更复杂匹配规则的模板获胜”。

订单意味着：“在具有相同特异性的多个竞争模板中，XSLT文档后面的模板获胜。

使用XSLT修改源文档的基本操作

1 个答案: