XSL:从运行XML文本的句子中创建列表项

时间:2018-02-15 15:12:22

标签: xml list xslt

我需要整理许多xml文本,其中列表可以在运行文本中找到。这个想法是将列表放在适当的列表元素中,因此可以使用不同的样式表以更一致的方式呈现它们。今天运行文本中的编号列表使用1. 2. 3.或1)2)3),未编号列表使用 - (连字符)或*。

我的XSL文件(其结构无法收集父列表元素中的列表项):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"  xmlns:fox="urn:lazy-fox-text" exclude-result-prefixes="fox">

<xsl:output method="xml" version="1.0" indent="yes" encoding="utf-8"/>
<xsl:strip-space elements="*"/>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
    <xsl:analyze-string select="." regex="(\d\))(\s*(.*))"> 
    <!-- <xsl:analyze-string select="." regex="(\-)(\s*(.*))"> -->
        <xsl:matching-substring>
            <item xmlns="urn:lazy-fox-text">
                <xsl:value-of select="replace(.,'^[\d]\)\s*','')"/> 
            </item>
        </xsl:matching-substring>

        <xsl:non-matching-substring>
            <xsl:value-of select="."/>
        </xsl:non-matching-substring>
    </xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>

示例输入文件:

<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="urn:lazy-fox-text">
  <Text number="1">
    <Title>Lazy dog jumper</Title>
    <Description>It is true, that:  1) The quick brown fox jumps over the lazy dog.<p />2) The quick red fox jumps over the lazy dog.<p />3) The old grey fox jumps over the lazy dog. It really does!<p />But I have never seen a cat jumping over that dog.</Description>
  </Text>
  <Text number="2">
    <Title>Lazy foxer</Title>
    <Description>The quick brown fox <arg format="x" /> jumps over the lazy dog owner.<p/>Rules: <p/>-Dogs must be activated.<p/>-Dogs must not sleep all day.</Description>
  </Text>
  <Text number="3">
    <Title>Quickest jumper</Title>
    <Description>The quickest brown fox jumps over the lazy dog.<p />The slowest brown fox jumps over the laziest dog.</Description>
<Action>1. Teach the fox not to jump.<p />2. Teach the dog to bark when the fox jumps.</Action>
  </Text>
</Data>   

期望的输出:

<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="urn:lazy-fox-text">
   <Text number="1">
      <Title>Lazy dog jumper</Title>
      <Description>It is true, that:
        <list type="number">  
          <item>The quick brown fox jumps over the lazy dog.</item>
          <item>The quick red fox jumps over the lazy dog.</item>
          <item>The old grey fox jumps over the lazy dog. It really does!</item>
       </list>
       <p/>But I have never seen a cat jumping over that dog.
      </Description>
   </Text>
   <Text number="2">
      <Title>Lazy foxer</Title>
      <Description>The quick brown fox <arg format="x"/> jumps over the lazy dog owner.<p/>Rules: <p/>
        <list type="bullet">
            <item>Dogs must be activated.</item>
            <item>Dogs must not sleep all day.<item>
        </list>
      </Description>
   </Text>
   <Text number="3">
      <Title>Quickest jumper</Title>
      <Description>The quickest brown fox jumps over the lazy dog.<p/>The slowest brown fox jumps over the laziest dog.</Description>
      <Action>
        <list type="number">  
            <item>Teach the fox not to jump.</item>
            <item>Teach the dog to bark when the fox jumps.</item>
      </Action>
   </Text>
</Data>

1 个答案:

答案 0 :(得分:1)

我试图忽略空p分别将它们视为列表的一部分,如果与带有项目的文本相邻,那么我有第一种模式转换任何以数字开头的文本或-*item元素,然后是第二种模式,使用for-each-group group-adjacent将相邻item封装到list中,并剥离空p }} S:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xpath-default-namespace="urn:lazy-fox-text"
    xmlns="urn:lazy-fox-text"
    version="3.0">

  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:mode name="items" on-no-match="shallow-copy"/>
  <xsl:mode name="lists" on-no-match="shallow-copy"/>
  <xsl:mode name="strip" on-no-match="shallow-copy"/>

  <xsl:variable name="items">
      <xsl:apply-templates mode="items"/>
  </xsl:variable>

  <xsl:variable name="lists">
      <xsl:apply-templates select="$items/node()" mode="lists"/>
  </xsl:variable>

  <xsl:template match="text()" mode="items">
      <xsl:analyze-string select="." regex="([0-9]+[).]|-|\*)(\s*(.*))">
          <xsl:matching-substring>
              <item numeric="{matches(regex-group(1), '^[0-9]')}">
                  <xsl:value-of select="regex-group(2)"/>
              </item>
          </xsl:matching-substring>
          <xsl:non-matching-substring>
              <xsl:value-of select="."/>
          </xsl:non-matching-substring>
      </xsl:analyze-string>
  </xsl:template>

  <xsl:template match="*[item]" mode="lists">
      <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:for-each-group select="node()" group-adjacent="boolean(self::item | self::p[not(node())])">
              <xsl:choose>
                  <xsl:when test="current-grouping-key() and current-group()[self::item]">
                      <list type="{if (current-group()[self::item[@numeric = 'true']]) then 'number' else 'bullet'}">
                          <xsl:apply-templates select="current-group()" mode="strip"/>
                      </list>
                  </xsl:when>
                  <xsl:otherwise>
                      <xsl:apply-templates select="current-group()" mode="#current"/>
                  </xsl:otherwise>
              </xsl:choose>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>

  <xsl:template match="item/@numeric | p[not(node())]" mode="strip"/>

  <xsl:template match="/">
      <xsl:copy-of select="$lists"/>
  </xsl:template>

</xsl:stylesheet>

输出与描述不完全相同(item s有一些前面的空格,但我想你可以解决这个问题)和一些p在吞下列表之前:

<Data xmlns="urn:lazy-fox-text">
   <Text number="1">
      <Title>Lazy dog jumper</Title>
      <Description>It is true, that:  <list type="number">
            <item> The quick brown fox jumps over the lazy dog.</item>
            <item> The quick red fox jumps over the lazy dog.</item>
            <item> The old grey fox jumps over the lazy dog. It really does!</item>
         </list>But I have never seen a cat jumping over that dog.</Description>
   </Text>
   <Text number="2">
      <Title>Lazy foxer</Title>
      <Description>The quick brown fox <arg format="x"/> jumps over the lazy dog owner.<p/>Rules: <list type="bullet">
            <item>Dogs must be activated.</item>
            <item>Dogs must not sleep all day.</item>
         </list>
      </Description>
   </Text>
   <Text number="3">
      <Title>Quickest jumper</Title>
      <Description>The quickest brown fox jumps over the lazy dog.<p/>The slowest brown fox jumps over the laziest dog.</Description>
      <Action>
         <list type="number">
            <item> Teach the fox not to jump.</item>
            <item> Teach the dog to bark when the fox jumps.</item>
         </list>
      </Action>
   </Text>
</Data>

代码是XSLT 3,因此与Saxon 9.8所有版本或9.7 PE或EE以及Altova 2017或2018一起发布,如果您需要XSLT 2,请使用身份转换替换所有xsl:mode元素

<xsl:template match="@* | node()" mode="#all">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()" mode="#current"/>
  </xsl:copy>
</xsl:template>