我需要整理许多xml文本,其中列表可以在运行文本中找到。这个想法是将列表放在适当的列表元素中,因此可以使用不同的样式表以更一致的方式呈现它们。今天运行文本中的编号列表使用1. 2. 3.或1)2)3),未编号列表使用 - (连字符)或*。
我的XSL文件(其结构无法收集父列表元素中的列表项):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:fox="urn:lazy-fox-text" exclude-result-prefixes="fox">
<xsl:output method="xml" version="1.0" indent="yes" encoding="utf-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:analyze-string select="." regex="(\d\))(\s*(.*))">
<!-- <xsl:analyze-string select="." regex="(\-)(\s*(.*))"> -->
<xsl:matching-substring>
<item xmlns="urn:lazy-fox-text">
<xsl:value-of select="replace(.,'^[\d]\)\s*','')"/>
</item>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
示例输入文件:
<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="urn:lazy-fox-text">
<Text number="1">
<Title>Lazy dog jumper</Title>
<Description>It is true, that: 1) The quick brown fox jumps over the lazy dog.<p />2) The quick red fox jumps over the lazy dog.<p />3) The old grey fox jumps over the lazy dog. It really does!<p />But I have never seen a cat jumping over that dog.</Description>
</Text>
<Text number="2">
<Title>Lazy foxer</Title>
<Description>The quick brown fox <arg format="x" /> jumps over the lazy dog owner.<p/>Rules: <p/>-Dogs must be activated.<p/>-Dogs must not sleep all day.</Description>
</Text>
<Text number="3">
<Title>Quickest jumper</Title>
<Description>The quickest brown fox jumps over the lazy dog.<p />The slowest brown fox jumps over the laziest dog.</Description>
<Action>1. Teach the fox not to jump.<p />2. Teach the dog to bark when the fox jumps.</Action>
</Text>
</Data>
期望的输出:
<?xml version="1.0" encoding="utf-8"?>
<Data xmlns="urn:lazy-fox-text">
<Text number="1">
<Title>Lazy dog jumper</Title>
<Description>It is true, that:
<list type="number">
<item>The quick brown fox jumps over the lazy dog.</item>
<item>The quick red fox jumps over the lazy dog.</item>
<item>The old grey fox jumps over the lazy dog. It really does!</item>
</list>
<p/>But I have never seen a cat jumping over that dog.
</Description>
</Text>
<Text number="2">
<Title>Lazy foxer</Title>
<Description>The quick brown fox <arg format="x"/> jumps over the lazy dog owner.<p/>Rules: <p/>
<list type="bullet">
<item>Dogs must be activated.</item>
<item>Dogs must not sleep all day.<item>
</list>
</Description>
</Text>
<Text number="3">
<Title>Quickest jumper</Title>
<Description>The quickest brown fox jumps over the lazy dog.<p/>The slowest brown fox jumps over the laziest dog.</Description>
<Action>
<list type="number">
<item>Teach the fox not to jump.</item>
<item>Teach the dog to bark when the fox jumps.</item>
</Action>
</Text>
</Data>
答案 0 :(得分:1)
我试图忽略空p
分别将它们视为列表的一部分,如果与带有项目的文本相邻,那么我有第一种模式转换任何以数字开头的文本或-
或*
到item
元素,然后是第二种模式,使用for-each-group group-adjacent
将相邻item
封装到list
中,并剥离空p
}} S:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="urn:lazy-fox-text"
xmlns="urn:lazy-fox-text"
version="3.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:mode name="items" on-no-match="shallow-copy"/>
<xsl:mode name="lists" on-no-match="shallow-copy"/>
<xsl:mode name="strip" on-no-match="shallow-copy"/>
<xsl:variable name="items">
<xsl:apply-templates mode="items"/>
</xsl:variable>
<xsl:variable name="lists">
<xsl:apply-templates select="$items/node()" mode="lists"/>
</xsl:variable>
<xsl:template match="text()" mode="items">
<xsl:analyze-string select="." regex="([0-9]+[).]|-|\*)(\s*(.*))">
<xsl:matching-substring>
<item numeric="{matches(regex-group(1), '^[0-9]')}">
<xsl:value-of select="regex-group(2)"/>
</item>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*[item]" mode="lists">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:for-each-group select="node()" group-adjacent="boolean(self::item | self::p[not(node())])">
<xsl:choose>
<xsl:when test="current-grouping-key() and current-group()[self::item]">
<list type="{if (current-group()[self::item[@numeric = 'true']]) then 'number' else 'bullet'}">
<xsl:apply-templates select="current-group()" mode="strip"/>
</list>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()" mode="#current"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="item/@numeric | p[not(node())]" mode="strip"/>
<xsl:template match="/">
<xsl:copy-of select="$lists"/>
</xsl:template>
</xsl:stylesheet>
输出与描述不完全相同(item
s有一些前面的空格,但我想你可以解决这个问题)和一些p
在吞下列表之前:
<Data xmlns="urn:lazy-fox-text">
<Text number="1">
<Title>Lazy dog jumper</Title>
<Description>It is true, that: <list type="number">
<item> The quick brown fox jumps over the lazy dog.</item>
<item> The quick red fox jumps over the lazy dog.</item>
<item> The old grey fox jumps over the lazy dog. It really does!</item>
</list>But I have never seen a cat jumping over that dog.</Description>
</Text>
<Text number="2">
<Title>Lazy foxer</Title>
<Description>The quick brown fox <arg format="x"/> jumps over the lazy dog owner.<p/>Rules: <list type="bullet">
<item>Dogs must be activated.</item>
<item>Dogs must not sleep all day.</item>
</list>
</Description>
</Text>
<Text number="3">
<Title>Quickest jumper</Title>
<Description>The quickest brown fox jumps over the lazy dog.<p/>The slowest brown fox jumps over the laziest dog.</Description>
<Action>
<list type="number">
<item> Teach the fox not to jump.</item>
<item> Teach the dog to bark when the fox jumps.</item>
</list>
</Action>
</Text>
</Data>
代码是XSLT 3,因此与Saxon 9.8所有版本或9.7 PE或EE以及Altova 2017或2018一起发布,如果您需要XSLT 2,请使用身份转换替换所有xsl:mode
元素
<xsl:template match="@* | node()" mode="#all">
<xsl:copy>
<xsl:apply-templates select="@* | node()" mode="#current"/>
</xsl:copy>
</xsl:template>