XML转换元素出现在文档中的错误位置

时间:2010-04-08 14:39:03

标签: xml xpath xslt xslt-2.0

我在使用XML转换时遇到一些问题,需要一些帮助。

样式表应遍历所有后缀元素,并在第一个祖先quote-block元素的最后一个文本节点旁边放置没有后缀标记的内容(请参阅所需的输出)。它仅在存在单个后缀时有效,但在存在2时不存在,当存在2时,它在第一个引用块的最后一个文本节点中将两个后缀放在一起。

有什么想法吗?我已经尝试将选择限制在祖先:: quote-block [1]的各个地方,但是没有达到预期的效果。

源XML

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’</quote-para>
                            <suffix>(Emphasis added.)</suffix>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”</para>
                </item>
            </list>
            <suffix>(emphasis in original)</suffix>
        </quote-block>
    </para>
</paragraph>

样式表

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://xml.sm.com/schema/cases/report"
    xmlns:sm="http://xml.sm.com/functions" xmlns:saxon="http://saxon.sf.net/"
    xpath-default-namespace="http://sm.com/schema/cases/report"
    exclude-result-prefixes="xs sm" version="2.0">

    <xsl:output method="xml" indent="no"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

    <!-- Match quote-blocks with open or close attributes. -->
    <xsl:template match="*[*:quote-block and descendant::*:suffix]">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Match inline quote with open or close attributes -->
    <xsl:template match="*[*:quote and descendant::*:suffix]">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Process the quote block -->
    <xsl:template name="process-quote-block">
        <xsl:variable name="quoteBlockCopy">
            <xsl:copy-of select="."/>
        </xsl:variable>

        <xsl:apply-templates select="$quoteBlockCopy" mode="append-suffix">
            <xsl:with-param name="suffix" select="sm:get-suffix-note(.)"/>
            <xsl:with-param name="end-node" select="sm:get-last-text-node($quoteBlockCopy)"/>
        </xsl:apply-templates>
    </xsl:template>

    <!-- Match quote-blocks with open or close attributes. -->
    <xsl:template match="*[*:quote-block and descendant::*:suffix][ancestor::*:quote-block[1]]" mode="create-copy">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Match inline quote with open or close attributes -->
    <xsl:template match="*[*:quote and descendant::*:suffix]" mode="create-copy">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- This will match all elements. Just copy and pass through the parameters. -->
    <xsl:template match="*" mode="append-suffix">
        <xsl:param name="suffix"/>
        <xsl:param name="end-node"/>
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates mode="append-suffix">
                <xsl:with-param name="suffix" select="$suffix"/>
                <xsl:with-param name="end-node" select="$end-node"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <!-- Apply the text node to the content. If the node is equal to the last node then append the descendants of suffix  -->
    <xsl:template match="text()[normalize-space() != '']" mode="append-suffix">
        <xsl:param name="suffix"/>
        <xsl:param name="end-node"/>
        <xsl:choose>
            <xsl:when test="count(. | $end-node) = 1">
                <xsl:value-of select="."/>
                <xsl:apply-templates select="$suffix"/>
            </xsl:when>
            <xsl:otherwise>
                <!-- Or maybe neither. -->
                <xsl:value-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!--  Dont copy suffix as -->
    <xsl:template match="*:suffix" mode="append-suffix"/>

    <xsl:function name="sm:get-suffix-note">
        <xsl:param name="node"/>
        <xsl:sequence select="$node/descendant::*:suffix/node()"/>
    </xsl:function>

    <xsl:function name="sm:get-last-text-node">
        <!--  Finds last non-empty text() node, ignoring <suffix> elements that are a child of this specific quote-block. -->
        <xsl:param name="node"/>

        <xsl:sequence
            select="reverse($node//text()[not(ancestor::*:suffix) and normalize-space() != ''])[1]"/>
    </xsl:function>

</xsl:stylesheet>

当前输出XML

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’</quote-para>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”(Emphasis
                        added.)(emphasis in original)</para>
                </item>
            </list>

        </quote-block>
    </para>
</paragraph>

期望的输出

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’(Emphasis
                                added.)</quote-para>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para>
                </item>
            </list>

        </quote-block>
    </para>
</paragraph>

2 个答案:

答案 0 :(得分:1)

伙计,你在这里挖了一个洞。 ;-)以下是我的想法:

<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" indent="no"/>

  <!-- key to identify all non-empty, non-suffix text node descendants of
       a quote-block. We'll use that to pull out the "last one" later-on -->
  <xsl:key 
    name ="kQbText" 
    match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]"
    use  ="generate-id(ancestor::quote-block[1])"
  />

  <!-- identity template to copy everything that is not otherwise handled -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <!-- special handling for text nodes that are descendants of quote-blocks -->
  <xsl:template match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]">
    <xsl:variable name="qb" select="ancestor::quote-block[1]" />

    <!-- the text node gets copied regardless -->
    <xsl:copy-of select="." />

    <!-- if it is the last non-empty text node, append all suffices -->
    <xsl:if test="
      generate-id() 
      = 
      generate-id( key('kQbText', generate-id($qb))[last()] )
    ">
      <xsl:for-each select="$qb/suffix">
        <xsl:value-of select="concat(' ', .)" />
      </xsl:for-each>
    </xsl:if>
  </xsl:template>

  <!-- empty text nodes will be removed (all others are copied) -->
  <xsl:template match="text()[normalize-space() = '']" />

  <!-- suffix nodes will be deleted-->
  <xsl:template match="suffix" />

</xsl:stylesheet>

以上结果(缩进和换行添加 tidy 使其可读):

<paragraph>
  <para>
    <quote-block>
      <list prefix-rules="specified">
        <item prefix="“B42">
          <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June
          2000, EME and EWS reached an agreement to negotiate
          towards a direct contract for coal haulage by rail (on a
          DIY basis), which would replace the previous indirect E2E
          arrangements that EME had in place with ECSL. An internal
          EWS e-mail noted: 
          <quote-block>
            <quote-para>‘We did the deal with Edison Mission
            yesterday morning for LBT-Fiddlers @ £[…]/tonne as
            agreed. This rate until 16th September pending a
            contract.</quote-para>
            <quote-para>
            <emphasis strength="strong">Enron are now off our hands
            so far as Edison are concerned. The Enron flows we have
            left are to British Energy’s station at Eggborough;
            from Immingham, Redcar and Hull</emphasis>. Also to
            Enron’s own power station at Wilton – 250,000
            tonnes/year. I think we are stuck Enron [sic] on the
            Eggborough traffic until next April when British Energy
            will, hopefully take over their own coal procurement. 
            <emphasis strength="strong">But we have got them out of
            Fiddlers Ferry and Ferrybridge – a big step
            forward</emphasis>.’ (Emphasis added.)</quote-para>
          </quote-block></para>
        </item>
        <item prefix="B43">
          <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This
          e-mail is evidence of both EWS’s intent and, indeed, its
          success in stopping ECSL from carrying out indirect
          supplies to EME, one of the new generating companies.”
          (emphasis in original)</para>
        </item>
      </list>
    </quote-block>
  </para>
</paragraph>

此处的XSLT代码是XSLT 1.0,但您可以在2.0处理器中以不变的方式运行它。

答案 1 :(得分:1)

这是一个只解决问题的简单转换。正如其他人所注意到的那样,问题以非常混乱的方式指明,并且不允许单一,明确的解释。

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:strip-space elements="*"/>

 <xsl:key name="kLastNonSufText"
   match="*[not(self::suffix)]/text()"
   use="generate-id(ancestor::quote-block[1])"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()[ancestor::quote-block]">
  <xsl:copy-of select="."/>

  <xsl:variable name="vQBImmed" select="ancestor::quote-block[1]"/>

  <xsl:variable name="vLastText" select=
   "key('kLastNonSufText', generate-id($vQBImmed))
      [last()]"/>

  <xsl:if test="count(.|$vLastText) = 1">
      <xsl:copy-of select="($vQBImmed//suffix)[last()]/text()"/>
  </xsl:if>
 </xsl:template>

 <xsl:template match="suffix"/>
</xsl:stylesheet>

将此转换应用于(非常难以理解且格式不佳)提供的源XML文档时:

<paragraph>
 <para>
  <quote-block>
    <list prefix-rules="specified">
        <item prefix="“B42">
            <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                reached an agreement to negotiate towards a direct contract for coal haulage
                by rail (on a DIY basis), which would replace the previous indirect E2E
                arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
                <quote-block>
                    <quote-para>‘We did the deal with Edison Mission yesterday morning for
                        LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                        pending a contract.</quote-para>
                    <quote-para>
                        <emphasis strength="strong">Enron are now off our hands so
                            far as Edison are concerned. The Enron flows we have left are to
                            British Energy’s station at Eggborough; from Immingham, Redcar
                            and Hull</emphasis>. Also to Enron’s own power station at Wilton
                        – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                        Eggborough traffic until next April when British Energy will,
                        hopefully take over their own coal procurement.
                        <emphasis
                            strength="strong">But we have got them out of Fiddlers Ferry and
                            Ferrybridge – a big step forward</emphasis>.’
                    </quote-para>
                    <suffix>(Emphasis added.)</suffix>
                </quote-block>
            </para>
        </item>
        <item prefix="B43">
            <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                indirect supplies to EME, one of the new generating companies.”</para>
        </item>
    </list>
    <suffix>(emphasis in original)</suffix>
  </quote-block>
 </para>
</paragraph>

输出的所需后缀附加到所需的文本节点

<?xml version="1.0" encoding="UTF-16"?><paragraph><para><quote-block><list prefix-rules="specified"><item prefix="“B42"><para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                reached an agreement to negotiate towards a direct contract for coal haulage
                by rail (on a DIY basis), which would replace the previous indirect E2E
                arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
                <quote-block><quote-para>‘We did the deal with Edison Mission yesterday morning for
                        LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                        pending a contract.</quote-para><quote-para><emphasis strength="strong">Enron are now off our hands so
                            far as Edison are concerned. The Enron flows we have left are to
                            British Energy’s station at Eggborough; from Immingham, Redcar
                            and Hull</emphasis>. Also to Enron’s own power station at Wilton
                        – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                        Eggborough traffic until next April when British Energy will,
                        hopefully take over their own coal procurement.
                        <emphasis strength="strong">But we have got them out of Fiddlers Ferry and
                            Ferrybridge – a big step forward</emphasis>.’
                    (Emphasis added.)</quote-para></quote-block></para></item><item prefix="B43"><para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para></item></list></quote-block></para></paragraph>