XML XSLT根据一些(但不是全部)子元素删除重复元素

时间:2016-12-16 10:18:07

标签: xml xslt duplicates nodes elements

将我的Windows Phone备份到.msg存档(本质上是.xml)后,我遇到了重复的条目。我整理了我的XML以便更好地阅读,我将在数千个条目中为您提供6个条目(3个重复传出消息和3个传入的重复消息):

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Message>
        <Recepients>
            <string>OutgoingNUMB</string>
        </Recepients>
        <Body>OutgoingMESG</Body>
        <IsIncoming>false</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000350501770000</LocalTimestamp>
        <Sender />
    </Message>
    <Message>
        <Recepients>
            <string>OutgoingNUMB</string>
        </Recepients>
        <Body>OutgoingMESG</Body>
        <IsIncoming>false</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000350501770000</LocalTimestamp>
        <Sender />
    </Message>
    <Message>
        <Recepients>
            <string>OutgoingNUMB</string>
        </Recepients>
        <Body>OutgoingMESG</Body>
        <IsIncoming>false</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000350501760000</LocalTimestamp>
        <Sender />
    </Message>
    <Message>
        <Recepients />
        <Body>IncomingMESG</Body>
        <IsIncoming>true</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000349290000000</LocalTimestamp>
        <Sender>IncomingNUMB</Sender>
    </Message>
    <Message>
        <Recepients />
        <Body>IncomingMESG</Body>
        <IsIncoming>true</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000349234630000</LocalTimestamp>
        <Sender>IncomingNUMB</Sender>
    </Message>
    <Message>
        <Recepients />
        <Body>IncomingMESG</Body>
        <IsIncoming>true</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000349234630000</LocalTimestamp>
        <Sender>IncomingNUMB</Sender>
    </Message>
</ArrayOfMessage>

我们有3个基本相同的传出消息1,以及3个传入消息,它们基本相同1。 传出消息格式与传入格式不同,但实际上并不重要。请注意,在重复项上,我可能有不同的“LocalTimestamp”,因为手机可能搞砸了。因此,如果消息是重复的,则LocalTimestamp不应该是用于检查消息的过滤器。

我想做的是以下内容:

我希望XSLT删除那些重复的条目,以便我只有以下内容:

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Message>
        <Recepients>
            <string>OutgoingNUMB</string>
        </Recepients>
        <Body>OutgoingMESG</Body>
        <IsIncoming>false</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000350501770000</LocalTimestamp>
        <Sender />
    </Message>
    <Message>
        <Recepients />
        <Body>IncomingMESG</Body>
        <IsIncoming>true</IsIncoming>
        <IsRead>true</IsRead>
        <Attachments />
        <LocalTimestamp>130000349234630000</LocalTimestamp>
        <Sender>IncomingNUMB</Sender>
    </Message>
</ArrayOfMessage>

过滤器应检查重复的“Message”标签,包含相同的子项(“Recepients”有或没有“string”,如果适用,“Body”,“IsIncoming”,“Attachments”,“Sender”),忽略标签“IsRead”和“LocalTimestamp”,但使用最旧的(即最低编号)“LocalTimestamp”作为要保留的那个,并将其余部分作为重复项进行抛弃。

我觉得可以只有一个过滤器(不管传入或传出的消息结构如何),但这取决于你。

提前感谢您的帮助。

== UPDATE

我研究并想出了这个(目前正在检查一个大型数据库是否存在错误)......

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<!--
-->
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
     <xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="Message[
(Recepients = following::Message/Recepients) 
and 
(Body = following::Message/Body) 
and 
(IsIncoming = following::Message/IsIncoming) 
and 
(Attachments = following::Message/Attachments) 
and 
(Sender = following::Message/Sender)
]"/>  
</xsl:stylesheet>

0 个答案:

没有答案