如何在元素值匹配时删除子根节点并在Python中重复

时间:2017-05-24 06:25:44

标签: python xml

我试图解析XML查找重复值。但是我需要删除整个元素块,如果它们在python中是重复的。 例如如下:

<?xml version="1.0" encoding="UTF-8"?><group>
<list-service uri="sip:accc@msg.pc.t-data.com"/>
<hunt xmlns:ht="http://www.t-data.com/xml/hunt" uri="sip:17738078709@msg.pc.t-data.com">
<ht:list>
<ht:huntItem>
<ht:huntUri>17753720@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
</ht:huntItem>
<ht:huntItem>
<ht:huntUri>19462562@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
</ht:huntItem>
<ht:huntItem>
<ht:huntUri>15668433@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
<ht:deviceId>urnmei:-131893-0</ht:deviceId>
</ht:huntItem>
<ht:huntItem>
<ht:huntUri>15668433@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
<ht:deviceId>urnmei:35775808-001226-0</ht:deviceId>
</ht:huntItem>
</ht:list>
</hunt>
</group>

从上面的XML我们需要检查重复值15668433@msg.pc.t-data.com

<ht:huntUri>15668433@msg.pc.t-data.com</ht:huntUri> 

如果发现重复则删除。

我能够找到他列出以下数据。

def getChildUsers(source,string):
try:
    result=[]
    i=0
    data=minidom.parseString(source)
    elementlist=data.getElementsByTagName(string)
    for att in elementlist:
        result.append(att.firstChild.nodeValue)
    return result
except:
    print('users fetch issue')
    #print string
    #raise

1 个答案:

答案 0 :(得分:0)

我能够通过使用以下代码实现它,希望它可以帮助某人

    for i in userList:
        #print i
        found = 0
        if i not in dataF:
            dataF.append(i)
        else:
            matchF.append(i)
            #print matchF
    #print userList 
    #print dataF
    print matchF
    if len(matchF) > 0:
        for page in root:                     # iterate over pages
            elems_to_remove = []
            for elem in page:
                for dat in matchF:
                    #print dat
                    num = 0
                    for ev in elem:
                        for e in ev:
                            #print e.tag
                            if e.tag.split('}')[1]=='huntUri' and dat==e.text:
                                num = num+1
                                break
                                if num == 2:
                                    break;
                                #print e.text
                        if num>0:
                            #print dat,e.text
                            #print dir(ev),ev.getchildren
                            print num
                            num=0
                           # for er in ev:
                           #     print er.text
                            elem.remove(ev)
                            break;
        tree.write("out.xml")
        writeF=open(processed_path+"/"+number+"_Final.xml","w")
        dataFile='++'.join(a.strip() for a in open('out.xml','r').readlines())
        data1=dataFile.replace('ns1:','ht:').replace(':ns0','').replace('ns0:','').replace(':ns1',':ht')
        listData=data1.split('++')
        listData[0]='<group'+namespace+'>'