xslt将具有相同属性的相邻兄弟将其拼接为一个,同时连接其文本

时间:2015-01-22 00:30:39

标签: xml xslt

背景:以下是MSWord表单的xslt修改的xml提取。来自MSWord表单的一些文本以某种方式被分成多个元素,并且需要重新组合成单​​个元素。下面是倒数第二个XML输入的实际片段

<Section coord="2.13" posn="2" of="13">
    <Segment coord="1.25" rowno="1" of="25">
        <Entry coord="1.1" colno="1" of="1" desgn="Table">QUAL</Entry>
        <Entry coord="1.1" colno="1" of="1" desgn="Table">I</Entry>
        <Entry coord="1.1" colno="1" of="1" desgn="Table">FICATIONS</Entry>
    </Segment>
    <Segment coord="2.25" rowno="2" of="25">
        <Entry coord="1.1" colno="1" of="1" desgn="Table">ACADEMIC QUALIFICATIONS</Entry>
        <Entry coord="1.1" colno="1" of="1" desgn="Table"> (Most recent first)</Entry>
    </Segment>
    <Segment coord="3.25" rowno="3" of="25">
        <Entry coord="1.4" colno="1" of="4" desgn="Column">Degree/Diploma/Certificate</Entry>
        <Entry coord="2.4" colno="2" of="4" desgn="Column">Institution</Entry>
        <Entry coord="3.4" colno="3" of="4" desgn="Column">Date Conferred</Entry>
        <Entry coord="3.4" colno="3" of="4" desgn="Column">(mm/yyyy)</Entry>
        <Entry coord="4.4" colno="4" of="4" desgn="Column">SAQA Evaluated?</Entry>
        <Entry coord="4.4" colno="4" of="4" desgn="Column">(If not SA qualification)</Entry>
    </Segment>
    <Segment coord="4.25" rowno="4" of="25"/>
    <!-- remaining 21 Segments from Section deleted ... -->
</Section>

注意:@coord属性是由兄弟“position()。last()”构建的。

所需的合并输出: 例如,在带有@coord 1.25的段中,需要将三个条目合并为一个条目:

<Entry coord="1.1" colno="1" of="1" desgn="Table">QUALIFICATIONS</Entry>

将他们的文本连成一个。

同样,段1.26有两个条目应该合并为:

<Entry coord="1.1" colno="1" of="1" desgn="Table">ACADEMIC QUALIFICATIONS (Most recent first)</Entry>

这同样适用于第3.25节中的最后两个,具有不同的合并条目:

<Entry coord="3.4" colno="3" of="4" desgn="Column">Date Conferred(mm/yyyy)</Entry>

<Entry coord="4.4" colno="4" of="4" desgn="Column">SAQA Evaluated?(If not SA qualification)</Entry>

我能够(按文档顺序)测试@coord的重复,例如: test =“@ coord = following-sibling :: Entry / @ coord”开始连接 要么 test =“@ coord!= preceding-sibling :: Entry / @ coord”停止连接 但我的困难是在连接文本时推迟xsl:copy。它在文档顺序上变得混乱(我的失败和未完成的尝试只进行一次连接而不是根据需要进行连接):

  <xsl:template match="Segment">
      <xsl:for-each select="Entry" >
          <xsl:choose>
            <xsl:when test="position()= 1 and (@coord = following-sibling::Entry/@coord)" >
              <xsl:copy>
                  <xsl:value-of select="@*"/><xsl:value-of select="text()" /> <xsl:value-of select="following-sibling::Entry/text()" />
              </xsl:copy>
            </xsl:when>
            <xsl:when test="@coord != preceding-sibling::Entry/@coord" >
              <xsl:copy>
                  <xsl:value-of select="@*"/><xsl:value-of select="text()" />
              </xsl:copy>
            </xsl:when>
            <xsl:otherwise>
                <xsl:for-each select=".">
                   <xsl:if test="@coord = following-sibling::Entry/@coord" >    
                       <xsl:value-of select="following-sibling::Entry/text()" />
                  </xsl:if>          
                </xsl:for-each>
            </xsl:otherwise>
          </xsl:choose>
          <xsl:copy>
              <xsl:apply-templates select="node()|@*"/>
          </xsl:copy>
      </xsl:for-each>
  </xsl:template>

似乎在反向文档顺序中可能更自然地连接,但它只是在思考它时仍然是混乱的。攻击这个问题的最佳方法是什么?

根据我对答案2的评论,如何根据建议()扩展其他父处理的答案。具有父属性的修改输入( frags =子::文本片段的数量,大小 =连接文本片段的总字符串长度)需要填充并显示在下面的xml输入为空属性。

<Section coord="2.13" posn="2" of="13">
<Segment coord="1.25" rowno="1" of="25" frags="" size="">
<Entry coord="1.1" colno="1" of="1" desgn="Table" size="4">QUAL</Entry>
<Entry coord="1.1" colno="1" of="1" desgn="Table" size="1">I</Entry>
<Entry coord="1.1" colno="1" of="1" desgn="Table" size="9">FICATIONS</Entry>
</Segment>
<Segment coord="2.25" rowno="2" of="25" frags="" size="">
<Entry coord="1.1" colno="1" of="1" desgn="Table" size="23">ACADEMIC QUALIFICATIONS</Entry>
<Entry coord="1.1" colno="1" of="1" desgn="Table" size="20"> (Most recent first)</Entry>
</Segment>
<Segment coord="3.25" rowno="3" of="25" frags="" size="">
<Entry coord="1.4" colno="1" of="4" desgn="Column" size="26">Degree/Diploma/Certificate</Entry>
<Entry coord="2.4" colno="2" of="4" desgn="Column" size="11">Institution</Entry>
<Entry coord="3.4" colno="3" of="4" desgn="Column" size="14">Date Conferred</Entry>
<Entry coord="3.4" colno="3" of="4" desgn="Column" size="9">(mm/yyyy)</Entry>
<Entry coord="4.4" colno="4" of="4" desgn="Column" size="15">SAQA Evaluated?</Entry>
<Entry coord="4.4" colno="4" of="4" desgn="Column" size="25">(If not SA qualification)</Entry>
</Segment>
<!-- delete -->
</Section>

父级(细分)元素的额外处理的预期输出:

<!-- deleted prior input xml -->
<Segment coord="1.25" rowno="1" of="25" frags="3" size="14">
<!-- deleted collapsed Entries as transformed -->
</Segment>
<Segment coord="2.25" rowno="2" of="25" frags="2" size="43">
<!-- deleted collapsed Entries as transformed -->
</Segment>
<Segment coord="3.25" rowno="3" of="25" frags="6" size="100">
<!-- deleted collapsed Entries as transformed -->
</Segment>
<!-- deleted rest of input xml -->

2 个答案:

答案 0 :(得分:1)

试试这个XSLT1.0样式表(这个XSLT2.0版本会简单得多):

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform template -->
<xsl:template match="node() | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Segment">
    <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:for-each select="Entry[not(@coord = preceding-sibling::Entry/@coord)]">
            <xsl:variable name="Coord" select="@coord"/>
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:for-each select="../Entry[@coord = $Coord]">
                    <xsl:value-of select="."/>
                </xsl:for-each>
            </xsl:copy>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>
</xsl:transform>

答案 1 :(得分:1)

适用于xslt1和2,但仅当以下情况属实时:如果父级和坐标相同,则应折叠两个条目。

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes"/>

    <xsl:key name="entry-by-coord" match="Entry" use="concat(generate-id(parent::*), '||', @coord)"/>

    <xsl:template match="/">
        <xsl:apply-templates select="*"/>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Entry">
        <xsl:if test="generate-id()=generate-id(key('entry-by-coord', concat(generate-id(parent::*), '||', @coord))[1])">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:copy-of select="key('entry-by-coord', concat(generate-id(parent::*), '||', @coord))/text()"/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>