使用XSL转换(XML到XML)包装多个列表元素序列

时间:2011-06-04 05:22:26

标签: xml xslt transformation

我有一些输入长(约3k行)的XML文档,通常看起来像:

<chapter someAttributes="someValues">
    <title>someTitle</title>

    <p>multiple paragraphs</p>
    <p>...</p>

    <li>
        <p>- some text</p>
    </li>
    <li>
        <p>- some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <li>
        <p>1. some text</p>
    </li>
    <li>
        <p>2. some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <!-- there are other elements such as table, illustration, ul etc. -->  
</chapter>

我想要的是用li ol元素包裹每个分散(我的意思是段落,表格,插图等){strong>序列 ul元素}或-元素取决于某些语义和返回包装的XML。

  • 如果段落中的第一个字符等于ul,那么它应为mark="DASH" 1.属性
  • 如果段落以2.3.ol等开头,那么我希望numeration="ARABIC"<ul mark="DASH"> <li> <p> some text</p> </li> <li> <p> some other text</p> </li> <ul>

例如(它只是一个序列):

-

如您所见,我需要剪切“标记所有段落中的字符”,即1.2.3.,{ {1}}等等。

输入XML比我描述的更复杂(嵌套序列,表元素中的内部序列),但我正在寻找一些想法,特别是如何 catch&amp;过程具有这种语义的特定序列。

我希望输出XML具有完全相同的顺序,只需要包装li个元素。如果需要,可以使用XSLT 2.0 / EXSLT。

2 个答案:

答案 0 :(得分:3)

这是一个XSLT 2.0样式表:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:for-each-group select="*" group-adjacent="boolean(self::li)">
        <xsl:choose>
          <xsl:when test="current-grouping-key() and ./p[1][starts-with(., '-')]">
            <ul mark="DASH">
              <xsl:apply-templates select="current-group()"/>
            </ul>
          </xsl:when>
          <xsl:when test="current-grouping-key() and ./p[1][matches(., '[0-9]\.')]">
            <ol numeration="arabic">
              <xsl:apply-templates select="current-group()"/>
            </ol>
          </xsl:when>
          <xsl:otherwise>
            <xsl:copy-of select="current-group()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="li/p/text()[1]">
    <xsl:value-of select="replace(., '^(-|[0-9]\.)', '')"/>
  </xsl:template>

</xsl:stylesheet>

当我使用Saxon 9.3和样式表以及样本输入

<chapter someAttributes="someValues">
    <title>someTitle</title>

    <p>multiple paragraphs</p>
    <p>...</p>

    <li>
        <p>- some text</p>
    </li>
    <li>
        <p>- some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <li>
        <p>1. some text</p>
    </li>
    <li>
        <p>2. some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <!-- there are other elements such as table, illustration, ul etc. -->  
</chapter>

我得到以下输出:

<?xml version="1.0" encoding="UTF-8"?>
<chapter>
   <title>someTitle</title>
   <p>multiple paragraphs</p>
   <p>...</p>
   <ul mark="DASH">
      <li>
        <p> some text</p>
      </li>
      <li>
        <p> some other text</p>
      </li>
   </ul>
   <p>multiple other paragraphs</p>
   <p>...</p>
   <ol numeration="arabic">
      <li>
        <p> some text</p>
      </li>
      <li>
        <p> some other text</p>
      </li>
   </ol>
   <p>multiple other paragraphs</p>
   <p>...</p>
</chapter>

答案 1 :(得分:1)

这是一个完整的功能性解决方案,没有任何程序方法,如xsl:for-each-groupxsl:if

Saxon-B 9.0.0.1J 下测试

XSLT 2.0

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes" method="html"/>

    <xsl:strip-space elements="*"/>

    <!-- identity -->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- override dash list elements -->
    <xsl:template match="li[(name(preceding-sibling::*[position()=1]) 
        != name(current())) 
        and matches(.,'^-')]">

        <ul mark="DASH">
            <li><xsl:apply-templates/></li>
            <!-- apply recursive template for adjacent nodes -->
            <xsl:apply-templates select="following-sibling::*[1][name()
                =name(current())]" mode="next"/>
        </ul>
    </xsl:template>

    <!-- override numeration list elements -->
    <xsl:template match="li[(name(preceding-sibling::*[position()=1]) 
        != name(current())) 
        and matches(.,'^[0-9]\.')]">
        <ol numeration="ARABIC">
            <li><xsl:apply-templates/></li>
            <xsl:apply-templates select="following-sibling::*[1][name()
                =name(current())]" mode="next"/>
        </ol>
    </xsl:template>

    <!-- recursive template for adjacent nodes -->
    <xsl:template match="*" mode="next">
        <li><xsl:apply-templates/></li>
        <xsl:apply-templates select="following-sibling::*[1][name()
            =name(current())]" mode="next"/>
    </xsl:template>

    <!-- remove marks/numeration from first text node -->
    <xsl:template match="li/p/text()[1]">
        <xsl:value-of select="replace(., '^(-|[0-9]\.)\s+', '')"/>
    </xsl:template>

</xsl:stylesheet>

应用于您的输入产生:

<chapter someAttributes="someValues">
   <title>someTitle</title>
   <p>multiple paragraphs</p>
   <p>...</p>
   <ul mark="DASH">
      <li>
         <p>some text</p>
      </li>
      <li>
         <p>some other text</p>
      </li>
   </ul>
   <!-- another li elements -->
   <p>multiple other paragraphs</p>
   <p>...</p>
   <ol numeration="ARABIC">
      <li>
         <p>some text</p>
      </li>
      <li>
         <p>some other text</p>
      </li>
   </ol>
   <!-- another li elements -->
   <p>multiple other paragraphs</p>
   <p>...</p>
   <!-- there are other elements such as table, illustration, ul etc. -->
</chapter>