使用XSLT从XML中剥离实体

时间:2014-05-16 22:01:23

标签: html xml xslt

我有一个简单的XML工作流程,我需要将其转换为HTML,但XML包含一个或多个我想要删除的实体。例如,原始XML包含项目符号实体•

我想使用<ul><li>元素替换或删除带有HTML项目符号的实际项目符号。

这是XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:apply-templates select="RULEBOOK"/>
</xsl:template>

<xsl:template match="text()[starts-with(., '&#8226;    ')]">
<xsl:value-of select="substring-after(., '&#8226;    ')"/>
</xsl:template>


<xsl:template match="bullets">
<li><xsl:apply-templates/></li>
</xsl:template>

<xsl:template match="ul">
<ul><xsl:apply-templates/></ul>
</xsl:template>
</xsl:stylesheet>

这是XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RULEBOOK>
<SECTION>
<Subsection>
<text>The game of golf should be played in the correct spirit and to understand this you should read the Etiquette Section in the Rules of Golf. In particular: </text>
<ul>
<bullets>&#8226;    show consideration to other players </bullets>
<bullets>&#8226;    play at a good pace and be ready to invite faster moving groups to play through, and </bullets>
<bullets>&#8226;    take care of the course by smoothing bunkers, replacing divots and repairing ball marks on the greens. </bullets>
</ul>
<text>Before starting your round you are advised to: </text>
<ul>
<bullets>&#8226;    read the Local Rules on the score card and the notice board </bullets>
<bullets>&#8226;    put an identification mark on your ball; many golfers play the same brand of ball and if you can&#8217;t identify your ball, it is considered lost (Rules <a>12-2</a> and <a>27-1</a>) </bullets>
<bullets>&#8226;    count your clubs; you are allowed a maximum of 14 clubs (Rule <a>4-4</a>).
</bullets>
</ul>
<text>During the round: </text>
<ul>
<bullets>&#8226;    don&#8217;t ask for advice from anyone except your partner (i.e., a player on your side) or your caddies; don&#8217;t give advice to anyone except your partner; you may ask for information on the Rules, distances and the position of hazards, the flagstick, etc. (Rule <a>8-1</a>) </bullets>
<bullets>&#8226;    don&#8217;t play any practice shots during play of a hole (Rule <a>7-2</a>) </bullets>
<bullets>&#8226;    don&#8217;t use any artificial devices or unusual equipment, unless specifically authorized by Local Rule (Rule <a>14-3</a>). </bullets>
</ul>
<text>At the end of your round:  </text>
<ul>
<bullets>&#8226;    in match play, ensure the result of the match is posted </bullets>
<bullets>&#8226;    in stroke play, ensure that your score card is completed properly (including being signed by you and your marker) and return it to the Committee as soon as possible (Rule <a>6-6</a>). </bullets>
</ul>
</Subsection>
</SECTION></RULEBOOK>

1 个答案:

答案 0 :(得分:1)

您可以添加此模板:

<xsl:template match="text()[starts-with(., '&#8226;    ')]">
    <xsl:value-of select="substring-after(., '&#8226;    ')"/>
</xsl:template>

将找到以项目符号和空格开头的任何文本节点,并仅保留其后的子字符串。

@keshlam提出了一个使用的解决方案,这更好,因为它不依赖于空格或如果子弹前有任何字符就会失败(但它会删除文本中任何位置的子弹,而不仅仅是在开头):< / p>

<xsl:template match="text()[contains(., '&#8226;')]">
    <xsl:value-of select="normalize-space(translate(., '&#8226;',''))"/>
</xsl:template>

normalize-space()功能会修剪文本,删除多余的空格或标签。

这适用于XSLT 1.0处理器,例如Xalan或Saxon 6.

<强>更新

这是一个完整的样式表(实际上与您发布的样式表相同,上面包含上一个模板):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" encoding="UTF-8"/>
    <xsl:template match="/">
        <xsl:apply-templates select="RULEBOOK"/>
    </xsl:template>

    <xsl:template match="text()[contains(., '&#8226;')]">
        <xsl:value-of select="normalize-space(translate(., '&#8226;',''))"/>
    </xsl:template>

    <xsl:template match="bullets">
        <li><xsl:apply-templates/></li>
    </xsl:template>

    <xsl:template match="ul">
        <ul><xsl:apply-templates/></ul>
    </xsl:template>

</xsl:stylesheet>

当从源页面复制并粘贴到新文件中时,它适用于您的源。如果它在具有完全相同内容的原始文件中不起作用,则原始文件可能具有不同的编码。