为什么我的XSLT在这里剥离HTML标记

时间:2014-01-24 21:12:28

标签: html xml json xslt

我正在使用XSLT 1.0将一些XML转换为JSON输出。不幸的是,我正在使用的一些XML中包含HTML标记。以下是一些XML输入的示例:

 <text>
 Kevin Love and Steph Curry can talk about their first-
 time starting gigs in the All-Star game Friday night when the Minnesota
 Timberwolves visit Oracle Arena to face the Golden State Warriors.
</text>
  <continue>
    <P>
 Love and Curry were two of four first-time All-Star starters when the league
 made the announcement on Thursday.
</P>
    <P>
 Love got a late push to overtake Houston Rockets center Dwight Howard in the
 final week of voting.
</P>
    <P>
 "I think it's a little sweeter this way because I really didn't expect it,"
 Love said on a conference call. "I was already humbled by the response the
 fans gave me to being very close to the top (frontcourt players). The outreach
 by the Minnesota fans and beyond was truly amazing."
</P>
</continue>

标记不理想,我需要在我的JSON输出中保留<P>标记。为了处理报价,我逃避了它们。这是我处理这个问题的模板:

<xsl:variable name="escaped-continue">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="continue"/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
     <xsl:variable name="escaped-text">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="text"/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
 <xsl:template name="replace-string">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="with"/>
        <xsl:choose>
            <xsl:when test="contains($text,$replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$with"/>
                <xsl:call-template name="replace-string">
                    <xsl:with-param name="text"
                        select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="with" select="$with"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
   </xsl:template>

然后,我只需使用以下内容输出JSON:

{
    "text": "<xsl:value-of select="normalize-space($escaped-text)"/>", 
    "continue": "<xsl:value-of select="normalize-space($escaped-continue)"/>"
}

我在这里遇到的问题是输出如下:

{
 "text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors.", 
  "continue": "Love and Curry were two of four first-time All-Star starters when the league made the announcement on Thursday. Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting. \"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"
}

如您所见,双引号已正确转义,但<P>标记已由XSLT解析器直接剥离和/或解析,然后由normalize-space()抑制。在这里将<P>标签重新添加到输出中的最佳方法是什么?

3 个答案:

答案 0 :(得分:1)

这就是xsl:value-of的定义。如果要保留标记,请使用xsl:copy-of。

答案 1 :(得分:1)

以这种方式尝试:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

<xsl:template match="/root">
    <xsl:text>{&#10;"text": "</xsl:text>
    <xsl:apply-templates select="text/text()"/>
    <xsl:text>"&#10;"continue": "</xsl:text>
    <xsl:apply-templates select="continue/*"/>
    <xsl:text>"&#10;}</xsl:text>
</xsl:template>

<xsl:template match="*">
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
<xsl:variable name="escaped-text">
    <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
    </xsl:call-template>
</xsl:variable>
<xsl:value-of select="normalize-space($escaped-text)"/>
</xsl:template>

<xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="with"/>
    <xsl:choose>
        <xsl:when test="contains($text,$replace)">
            <xsl:value-of select="substring-before($text,$replace)"/>
            <xsl:value-of select="$with"/>
            <xsl:call-template name="replace-string">
                <xsl:with-param name="text"
                    select="substring-after($text,$replace)"/>
                <xsl:with-param name="replace" select="$replace"/>
                <xsl:with-param name="with" select="$with"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

应用于输入的修改版本(添加了根元素和一些用于测试的标记):

<root>
    <text>
    Kevin Love and Steph Curry can talk about their first-
    time starting gigs in the All-Star game Friday night when the Minnesota
    Timberwolves visit Oracle Arena to face the Golden State Warriors.
    </text>
    <continue>
        <P>
        Love and Curry were <i>two of <b>four</b> first-time All-Star</i> starters when the league
        made the announcement on Thursday.
        </P>
        <P>
        Love got a late push to overtake Houston Rockets center Dwight Howard in the
        final week of voting.
        </P>
        <P>
        "I think it's a little sweeter this way because I really didn't expect it,"
        Love said on a conference call. "I was already humbled by the response the
        fans gave me to being very close to the top (frontcourt players). The outreach
        by the Minnesota fans and beyond was truly amazing."
        </P>
    </continue>
</root>

产生以下结果:

{
"text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors."
"continue": "<P>Love and Curry were<i>two of<b>four</b>first-time All-Star</i>starters when the league made the announcement on Thursday.</P><P>Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting.</P><P>\"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"</P>"
}

答案 2 :(得分:0)

当您将continue作为参数传递给escaped-continue的文本时,您将删除该步骤中的<p>标记。您可以将exslt node-sets与XSLT 1.0一起使用并处理replace-string模板中的节点,或者重写escaped-continue以解析节点和文本,并仅为文本节点调用replace-string