我试图解析XML查找重复值。但是我需要删除整个元素块,如果它们在python中是重复的。 例如如下:
<?xml version="1.0" encoding="UTF-8"?><group>
<list-service uri="sip:accc@msg.pc.t-data.com"/>
<hunt xmlns:ht="http://www.t-data.com/xml/hunt" uri="sip:17738078709@msg.pc.t-data.com">
<ht:list>
<ht:huntItem>
<ht:huntUri>17753720@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
</ht:huntItem>
<ht:huntItem>
<ht:huntUri>19462562@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
</ht:huntItem>
<ht:huntItem>
<ht:huntUri>15668433@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
<ht:deviceId>urnmei:-131893-0</ht:deviceId>
</ht:huntItem>
<ht:huntItem>
<ht:huntUri>15668433@msg.pc.t-data.com</ht:huntUri>
<ht:userId>U-1-f0c8-431c-84fa-6f0dfc6b22de</ht:userId>
<ht:deviceId>urnmei:35775808-001226-0</ht:deviceId>
</ht:huntItem>
</ht:list>
</hunt>
</group>
从上面的XML我们需要检查重复值15668433@msg.pc.t-data.com
<ht:huntUri>15668433@msg.pc.t-data.com</ht:huntUri>
如果发现重复则删除。
我能够找到他列出以下数据。
def getChildUsers(source,string):
try:
result=[]
i=0
data=minidom.parseString(source)
elementlist=data.getElementsByTagName(string)
for att in elementlist:
result.append(att.firstChild.nodeValue)
return result
except:
print('users fetch issue')
#print string
#raise
答案 0 :(得分:0)
我能够通过使用以下代码实现它,希望它可以帮助某人
for i in userList:
#print i
found = 0
if i not in dataF:
dataF.append(i)
else:
matchF.append(i)
#print matchF
#print userList
#print dataF
print matchF
if len(matchF) > 0:
for page in root: # iterate over pages
elems_to_remove = []
for elem in page:
for dat in matchF:
#print dat
num = 0
for ev in elem:
for e in ev:
#print e.tag
if e.tag.split('}')[1]=='huntUri' and dat==e.text:
num = num+1
break
if num == 2:
break;
#print e.text
if num>0:
#print dat,e.text
#print dir(ev),ev.getchildren
print num
num=0
# for er in ev:
# print er.text
elem.remove(ev)
break;
tree.write("out.xml")
writeF=open(processed_path+"/"+number+"_Final.xml","w")
dataFile='++'.join(a.strip() for a in open('out.xml','r').readlines())
data1=dataFile.replace('ns1:','ht:').replace(':ns0','').replace('ns0:','').replace(':ns1',':ht')
listData=data1.split('++')
listData[0]='<group'+namespace+'>'