如何删除重复条目 - XSLT

时间:2021-01-25 16:17:03

标签: xslt xslt-2.0 xslt-3.0

我尝试删除实体 &#x00a7; 之后的重复条目,如果在条目中包含 ,,并且在 tokenize 之后 start-with ( 圆括号然后条目 eg (17200(b)(2), (4)–(6)) s/b eg (<p>17200(b)(2)</p><p>17200(b)(4)–(6)</p>).
输入 XML

<root>
    <p>CC &#x00a7;1(a), (b), (c)</p>
    <p>Civil Code &#x00a7;1(a), (b)</p>
    <p>CC &#x00a7;&#x00a7;2(a)</p>
    <p>Civil Code &#x00a7;3(a)</p>
    <p>CC &#x00a7;1(c)</p>
    <p>Civil Code &#x00a7;1(a), (b), (c)</p>
    <p>Civil Code &#x00a7;17200(b)(2), (4)–(6), (8), (12), (16), (20), and (21)</p>
</root>

预期产出

<root>
   <sec specific-use="CC">
      <title content-type="Sta_Head3">CIVIL CODE</title>
      <p>1(a)</p>
      <p>1(b)</p>
      <p>1(c)</p>
      <p>2(a)</p>
      <p>3(a)</p>
      <p>17200(b)(2)</p>
      <p>17200(b)(4)–(6)</p>
      <p>17200(b)(8)</p>
      <p>17200(b)(12)</p>
      <p>17200(b)(16)</p>
      <p>17200(b)(20)</p>
      <p>17200(b)(21)</p>
   </sec>
</root>

XSLT 代码

<xsl:template match="root">
    <xsl:copy>
        <xsl:for-each-group select="p[(starts-with(., 'CC ') or starts-with(., 'Civil Code'))]" group-by="replace(substring-before(., ' &#x00a7;'), 'Civil Code', 'CC')">
            <xsl:text>&#x0A;</xsl:text>
            <sec specific-use="{current-grouping-key()}">
                <xsl:text>&#x0A;</xsl:text>
                <title content-type="Sta_Head3">CIVIL CODE</title>
                <xsl:for-each-group select="current-group()" group-by="replace(substring-after(., '&#x00a7;'), '&#x00a7;', '')">
                    <xsl:sort select="replace(current-grouping-key(), '[^0-9.].*$', '')" data-type="number" order="ascending"/>
                    <xsl:for-each 
                        select="distinct-values(
                        current-grouping-key() ! 
                        (let $tokens := tokenize(current-grouping-key(), ', and |, | and ') 
                        return (head($tokens), tail($tokens) ! (substring-before(head($tokens), '(') || .)))
                        )" expand-text="yes">
                        <p>{.}</p>
                    </xsl:for-each>
                </xsl:for-each-group>
            </sec>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

1 个答案:

答案 0 :(得分:0)

您可以这样做,采用两步法,首先计算现有元素的列表,然后使用 for-each-group 删除重复项。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <xsl:variable name="listP">
        <xsl:apply-templates select="root/p"/>
    </xsl:variable>
    
    <xsl:for-each-group select="$listP" group-by="p">
        <p><xsl:value-of select="current-grouping-key()"/></p>
    </xsl:for-each-group>
  </xsl:template>
  
  <xsl:template match="p">
    <xsl:variable name="input" select="replace(substring-after(.,'&#x00a7;'),'&#x00a7;','')"/>
    <xsl:variable name="chapter" select="substring-before($input,'(')"/>
    <xsl:for-each select="tokenize(substring-after($input, $chapter),',')">
        <p><xsl:value-of select="concat($chapter,replace(replace(.,' ',''),'and',''))"/></p>    
    </xsl:for-each>
  </xsl:template>
  
</xsl:stylesheet>

看到它在这里工作:https://xsltfiddle.liberty-development.net/gVrvcxQ

相关问题