我需要整理许多xml文本,其中列表可以在运行文本中找到。这个想法是将列表放在适当的列表元素中,因此可以使用不同的样式表以更一致的方式呈现它们。今天运行文本中的编号列表使用1. 2. 3.或1)2)3),未编号列表使用 - (连字符)或*。


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"  xmlns:fox="urn:lazy-fox-text" exclude-result-prefixes="fox">

<xsl:output method="xml" version="1.0" indent="yes" encoding="utf-8"/>
<xsl:strip-space elements="*"/>

<xsl:template match="@*|node()">
        <xsl:apply-templates select="@*|node()"/>

<xsl:template match="text()">
    <xsl:analyze-string select="." regex="(\d\))(\s*(.*))"> 
    <!-- <xsl:analyze-string select="." regex="(\-)(\s*(.*))"> -->
            <item xmlns="urn:lazy-fox-text">
                <xsl:value-of select="replace(.,'^[\d]\)\s*','')"/> 

            <xsl:value-of select="."/>


<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="urn:lazy-fox-text">
  <Text number="1">
    <Title>Lazy dog jumper</Title>
    <Description>It is true, that:  1) The quick brown fox jumps over the lazy dog.<p />2) The quick red fox jumps over the lazy dog.<p />3) The old grey fox jumps over the lazy dog. It really does!<p />But I have never seen a cat jumping over that dog.</Description>
  <Text number="2">
    <Title>Lazy foxer</Title>
    <Description>The quick brown fox <arg format="x" /> jumps over the lazy dog owner.<p/>Rules: <p/>-Dogs must be activated.<p/>-Dogs must not sleep all day.</Description>
  <Text number="3">
    <Title>Quickest jumper</Title>
    <Description>The quickest brown fox jumps over the lazy dog.<p />The slowest brown fox jumps over the laziest dog.</Description>
<Action>1. Teach the fox not to jump.<p />2. Teach the dog to bark when the fox jumps.</Action>


<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="urn:lazy-fox-text">
   <Text number="1">
      <Title>Lazy dog jumper</Title>
      <Description>It is true, that:
        <list type="number">  
          <item>The quick brown fox jumps over the lazy dog.</item>
          <item>The quick red fox jumps over the lazy dog.</item>
          <item>The old grey fox jumps over the lazy dog. It really does!</item>
       <p/>But I have never seen a cat jumping over that dog.
   <Text number="2">
      <Title>Lazy foxer</Title>
      <Description>The quick brown fox <arg format="x"/> jumps over the lazy dog owner.<p/>Rules: <p/>
        <list type="bullet">
            <item>Dogs must be activated.</item>
            <item>Dogs must not sleep all day.<item>
   <Text number="3">
      <Title>Quickest jumper</Title>
      <Description>The quickest brown fox jumps over the lazy dog.<p/>The slowest brown fox jumps over the laziest dog.</Description>
        <list type="number">  
            <item>Teach the fox not to jump.</item>
            <item>Teach the dog to bark when the fox jumps.</item>

我试图忽略空p分别将它们视为列表的一部分,如果与带有项目的文本相邻,那么我有第一种模式转换任何以数字开头的文本或-*item元素,然后是第二种模式,使用for-each-group group-adjacent将相邻item封装到list中,并剥离空p }} S:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:mode name="items" on-no-match="shallow-copy"/>
  <xsl:mode name="lists" on-no-match="shallow-copy"/>
  <xsl:mode name="strip" on-no-match="shallow-copy"/>

  <xsl:variable name="items">
      <xsl:apply-templates mode="items"/>

  <xsl:variable name="lists">
      <xsl:apply-templates select="$items/node()" mode="lists"/>

  <xsl:template match="text()" mode="items">
      <xsl:analyze-string select="." regex="([0-9]+[).]|-|\*)(\s*(.*))">
              <item numeric="{matches(regex-group(1), '^[0-9]')}">
                  <xsl:value-of select="regex-group(2)"/>
              <xsl:value-of select="."/>

  <xsl:template match="*[item]" mode="lists">
          <xsl:apply-templates select="@*"/>
          <xsl:for-each-group select="node()" group-adjacent="boolean(self::item | self::p[not(node())])">
                  <xsl:when test="current-grouping-key() and current-group()[self::item]">
                      <list type="{if (current-group()[self::item[@numeric = 'true']]) then 'number' else 'bullet'}">
                          <xsl:apply-templates select="current-group()" mode="strip"/>
                      <xsl:apply-templates select="current-group()" mode="#current"/>

  <xsl:template match="item/@numeric | p[not(node())]" mode="strip"/>

  <xsl:template match="/">
      <xsl:copy-of select="$lists"/>


输出与描述不完全相同(item s有一些前面的空格,但我想你可以解决这个问题)和一些p在吞下列表之前:

<Data xmlns="urn:lazy-fox-text">
   <Text number="1">
      <Title>Lazy dog jumper</Title>
      <Description>It is true, that:  <list type="number">
            <item> The quick brown fox jumps over the lazy dog.</item>
            <item> The quick red fox jumps over the lazy dog.</item>
            <item> The old grey fox jumps over the lazy dog. It really does!</item>
         </list>But I have never seen a cat jumping over that dog.</Description>
   <Text number="2">
      <Title>Lazy foxer</Title>
      <Description>The quick brown fox <arg format="x"/> jumps over the lazy dog owner.<p/>Rules: <list type="bullet">
            <item>Dogs must be activated.</item>
            <item>Dogs must not sleep all day.</item>
   <Text number="3">
      <Title>Quickest jumper</Title>
      <Description>The quickest brown fox jumps over the lazy dog.<p/>The slowest brown fox jumps over the laziest dog.</Description>
         <list type="number">
            <item> Teach the fox not to jump.</item>
            <item> Teach the dog to bark when the fox jumps.</item>

代码是XSLT 3,因此与Saxon 9.8所有版本或9.7 PE或EE以及Altova 2017或2018一起发布,如果您需要XSLT 2,请使用身份转换替换所有xsl:mode元素

<xsl:template match="@* | node()" mode="#all">
    <xsl:apply-templates select="@* | node()" mode="#current"/>