XPath - 从非重复列表中获取具有不同文本值的节点

时间:2015-09-01 20:55:32

标签: xml xpath xpath-1.0

我有一个XML如下:

<object>
    <codes>
        <cd1>A</cd1>
        <cd2>B</cd2>
        <cd3>C</cd3>
    </codes>
    <codes>
        <cd1>A</cd1>
        <cd2>D</cd2>
        <cd3></cd3>
    </codes>
    <codes>
        <cd1>E</cd1>
        <cd2>D</cd2>
        <cd3></cd3>
    </codes>
</object>

到目前为止,我的XPath演变如下:

  1. //cd1|//cd2|//cd3:获取所有cd1,cd2和cd3元素

  2. (//cd1|//cd2|//cd3)[text()[1]]:过滤上面列表中包含非空文本值的所有元素,并返回以下元素。

    <cd1>A</cd1> <cd2>B</cd2> <cd3>C</cd3> <cd1>A</cd1> <cd2>D</cd2> <cd1>E</cd1> <cd2>D</cd2>

  3. 现在我需要删除具有重复文本值的元素。我试过xpath为(//cd1|//cd2|//cd3)[text()[1]][(preceding::cd1)|(preceding::cd2)|(preceding::cd3)]。我是什么 希望实现的是检查值是否在上面的cd1或cd2或cd3中的任何一个之前。但是这会在<cd2>D</cd2>重复的地方返回。

    <cd2>B</cd2> <cd3>C</cd3> <cd1>A</cd1> <cd2>D</cd2> <cd1>E</cd1> <cd2>D</cd2>

  4. 如何编写一个xpath来解决上述(3)?

    请注意我必须使用Xpath 1.0,因此不能选择distinct-values函数。另外,我需要获取匹配的节点列表,而不是xpath中的文本值,因为我必须使用AXIOM在这些节点上进行更多处理。

    更新:我正在使用此xpath来获取匹配的元素,然后使用AXIOM进行处理。因此,我需要编写一个单独的xpath表达式来一次性获取匹配元素(我无法在AXIOM中编写自定义流或使用XSLT)。另外cd *也不能使用,因为实名不匹配。我在这里使用了一个样本。

3 个答案:

答案 0 :(得分:1)

One way I found is with following template:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
    <xsl:for-each select="//*[starts-with(node-name(.), 'cd')]">
        <xsl:variable name="content"><xsl:value-of select="text()"/></xsl:variable>
        <xsl:if test="count(preceding::*[starts-with(node-name(.), 'cd') and text() = $content]) = 0 and text()">
               <xsl:copy-of select="."/> 
            </xsl:if>
     </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

This takes all cd* elements, and takes the content for each of them, which it uses to count how many preceding with the same content there are - if thats 0 -> then it uses it.

As far as I know this is the only way this can be done in xslt-1 (by using a variable). This is because you cannot back-reference within the xpath - unless you have the value in the variable (and you need to compare the "current" (outside) text with the "current" (node within the xpath) text).

Hope this helps.

答案 1 :(得分:1)

This is actually pretty straightforward Muenchian grouping, just with three keys:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes" method="xml" />
<xsl:key name="cd1" match="//cd1" use="text()" />
<xsl:key name="cd2" match="//cd2" use="text()" />
<xsl:key name="cd3" match="//cd3" use="text()" />

<xsl:template match="/">    
    <xsl:for-each select="/object/codes/cd1[./text() != '' and count(. | key('cd1', .)[1]) = 1]">
        <xsl:copy-of select="." />
    </xsl:for-each>

    <xsl:for-each select="/object/codes/cd2[./text() != '' and count(. | key('cd2', .)[1]) = 1]">
        <xsl:copy-of select="." />
    </xsl:for-each>
    <xsl:for-each select="/object/codes/cd3[./text() != '' and count(. | key('cd3', .)[1]) = 1]">
        <xsl:copy-of select="." />
    </xsl:for-each>

</xsl:template>
</xsl:stylesheet>

Output:

<?xml version="1.0" encoding="UTF-8"?>
<cd1>A</cd1>
<cd1>E</cd1>
<cd2>B</cd2>
<cd2>D</cd2>
<cd3>C</cd3>

Alternatively, if you want to group them regardless of node name (i.e. if cd1 and cd2 both have A as the text value), it's a little less straightforward.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes" method="xml" />
<xsl:key name="cd" match="//cd1 | //cd2 | //cd3" use="text()" />

<xsl:template match="/">    
    <xsl:for-each select="/object/codes/cd1[./text() != '' and count(. | key('cd', .)[1]) = 1] | /object/codes/cd2[./text() != '' and count(. | key('cd', .)[1]) = 1] | /object/codes/cd3[./text() != '' and count(. | key('cd', .)[1]) = 1]">
        <xsl:copy-of select="." />
    </xsl:for-each>


</xsl:template>
</xsl:stylesheet>

This would give the same output as above (but ordered the way your current template outputs), but would eliminate duplicates between a cd1, cd2, or cd3 sharing the same text (and only take the first one that has it).

Also note that I'm ignoring empty nodes - that may not be desired (and can be easily fixed by removing ./text() != '' from the selectors - however, a different method would have to be used to eliminate duplicate empty nodes if that was desired (probably just a series of templates or xsl:ifs that test for an empty node and output a single one if any exist in that case).

答案 2 :(得分:1)

Try this:

//cd1[not(text() = preceding::cd1/text())][normalize-space()]|
//cd2[not(text() = preceding::cd2/text())][normalize-space()]|
//cd3[not(text() = preceding::cd3/text())][normalize-space()]