我在先前的帖子中有一个后续问题: How to normalize XML on reverse domain name sorting and custom filtering
这里有数百个重复的标签删除问题。我试图通过遍历来基于逻辑删除重复的节点,但是似乎不起作用:
<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
<a>blah</a>
<b>blah</b>
<maps>
<mapIndividual>
<src>
<scheme>https</scheme>
<domain>photos.yahoo.com</domain>
<path>somepath</path>
<query>blah</query>
</src>
<loc>C:\var\tmp</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<domain>photos.yahoo.com</domain>
<path>somepath</path>
<query>blah</query>
</src>
<loc>C:\var\tmp</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>tcp</scheme>
<domain>map.google.com</domain>
<port>80</port>
<path>/value</path>
<query>blah</query>
</src>
<tgt>
<scheme>https</scheme>
<domain>map.google.com</domain>
<port>443</port>
<path>/value</path>
<query>blah</query>
</tgt>
<loc>C:\var\tmp2</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>tcp</scheme>
<domain>map.google.com</domain>
<path>/value</path>
<query>blah</query>
</src>
<tgt>
<domain>map.google.com</domain>
<path>/value</path>
<query>blah</query>
</tgt>
<loc>C:\var\tmp2</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>http</scheme>
<domain>*.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
</mapIndividual>
<mapIndividual>
<src>
<scheme>http</scheme>
<domain>d.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<y>blah</y>
</mapIndividual>
<maps>
</mapGeo>
我尝试了多种方式进行操作,例如XSLT 1.0,XSLT 2.0,但我知道我在犯一些错误,并且无法使其正常工作:
我尝试的方法:
<xsl:key name="kPropertyByName" match="domain" use="text()" />
...
<xsl:template match="domain[not(generate-id() = generate-id(key('kPropertyByName', text())[1]))]"/>
<xsl:key name="property" match="mapIndividual" use="concat(generate-id(parent::*), scheme, '|', domain, '|', port, '|', path, '|', query)" />
...
<xsl:apply-templates select="mapIndividual/src[generate-id(.) = generate-id(key('property', concat(generate-id(parent::*), scheme, '|', domain, '|', port, '|', path, '|', query))[1])]" />
<xsl:for-each-group select="mapIndividual" group-by="domain">
<xsl:sequence select="."/>
</xsl:for-each-group>
我还有其他代码如下:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes"
version="2.0">
<xsl:output method="xml" encoding="utf-8" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<!-- not working -->
<!--
<xsl:key name="kPropertyByName" match="domain" use="text()" />
<xsl:key name="property" match="src" use="concat(generate-id(parent::*), schema, '|', domain, '|', port, '|', path, '|', query)" />
-->
<xsl:template match="maps">
<xsl:copy>
<xsl:apply-templates select="*">
<xsl:sort select="src/domain" />
<xsl:sort select="src/port" />
<xsl:sort select="src/path" />
<xsl:sort select="src/query" />
</xsl:apply-templates>
</xsl:copy>
<!-- not working -->
<!--
<xsl:apply-templates select="mapIndividual/src[generate-id(.) = generate-id(key('property', concat(generate-id(parent::*), schema, '|', domain, '|', port, '|', path, '|', query))[1])]" />
-->
</xsl:template>
<!-- not working -->
<!--
<xsl:template
match="domain[
not(
generate-id() =
generate-id(key('kPropertyByName', text())[1])
)
]"/>
-->
<xsl:template match="schema[text() = '' or text() = 'http' or text() = 'https']" />
<xsl:template match="port[text() = '80' or text() = '443']" />
<xsl:template match="*[not(@*|*|comment()|processing-instruction()) and normalize-space()='']" />
</xsl:stylesheet>
以下各项需要考虑:
<mapIndividual>
仅在一个地方存在<scheme>https</scheme>
,但前两个<domain>photos.yahoo.com</domain>
节点是重复的。同样,尽管<mapIndividual>
和<domain>map.google.com</domain>
可能存在也可能不存在,但带有<scheme>https</scheme>
的{{1}}是重复的。<port>443</port>
可以是空标签,空字符串,http或https,而<scheme>
标签可以是空标签,空字符串,80或443。请帮助并提前谢谢您!
答案 0 :(得分:0)
这是一种通过第一种模式运行输入以根据问题描述的默认值剥离scheme
和port
的方法,然后使用XSLT 3的复合分组来消除重复项:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:mode name="default" on-no-match="shallow-copy"/>
<xsl:template match="scheme[normalize-space() = ('http', 'https', '')]" mode="default"/>
<xsl:template match="port[normalize-space() = ('', 80, 443)]" mode="default"/>
<xsl:variable name="defaults-stripped">
<xsl:apply-templates mode="default"/>
</xsl:variable>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="/">
<xsl:apply-templates select="$defaults-stripped/node()"/>
</xsl:template>
<xsl:template match="maps">
<xsl:copy>
<xsl:for-each-group
select="mapIndividual" composite="yes"
group-by="src ! (domain, string(scheme), string(path), string(port))">
<xsl:sequence select="."/>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/94rmq6E/1
这只是作为示例,我不确定我是否已掌握剥离默认值或使用各种子元素或后代元素识别重复项的确切要求。