我有两个xml文件,如下所示,我想用文件A检查文件B的顺序(文件B应该遵循文件A的顺序)。我还编写了一个程序,下面是维护顺序的工作,唯一的问题是我无法正确地将输出写入另一个xml文件。在问这里之前,我确实研究了如何将编辑过的xml文件写回源或另一个文件,但也许我错过了一些非常小的东西。
档案A
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<p1:sample1 xmlns:p1="http://www.example.org/eHorizon">
<p1:time nTimestamp="1">
<p1:location hours = "1" path = '1'>
<p1:feature color="6" type="a">560</p1:feature>
</p1:location>
</p1:time>
<p1:time nTimestamp="2">
<p1:location hours = "1" path = '1'>
<p1:feature color="2" type="a">564</p1:feature>
</p1:location>
</p1:time>
<p1:time nTimestamp="3">
<p1:location hours = "1" path = '1'>
<p1:feature color="6" type="a">560</p1:feature>
</p1:location>
</p1:time>
</p1:sample1>
档案B
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<p1:sample1 xmlns:p1="http://www.example.org/eHorizon">
<p1:time nTimestamp="1">
<p1:location hours = "1" path = '1'>
<p1:feature color="6" type="a">560</p1:feature>
</p1:location>
</p1:time>
<p1:time nTimestamp="3">
<p1:location hours = "1" path = '1'>
<p1:feature color="6" type="a">560</p1:feature>
</p1:location>
</p1:time>
<p1:time nTimestamp="2">
<p1:location hours = "1" path = '1'>
<p1:feature color="2" type="a">564</p1:feature>
</p1:location>
</p1:time>
</p1:sample1>
仅为了您的信息,这里唯一的区别是整个p1:time
元素的顺序,由nTimestamps
及其子元素location
和feature
表示。您可以在文件A中看到它是1,2,3...
,在文件B中它是1,3,2...
(我说的是整个p1:time
元素及其中的所有内容)
我想要什么
from lxml import etree
from collections import defaultdict
from distutils.filelist import findall
from lxml._elementpath import findtext
recovering_parser = etree.XMLParser(recover=True)
Reference = etree.parse("C:/Users/your_location/Desktop/sample1.xml", parser=recovering_parser)
Copy = etree.parse("C:/Users/your_location/Desktop/sample2.xml", parser=recovering_parser)
ReferenceTest = Reference.findall("{http://www.example.org/eHorizon}time") #find all time elements in sample1
CopyTest = Copy.findall("{http://www.example.org/eHorizon}time") #find all time elements in sample2
a=[] #list for storing sample1's Time elements
b=[] #list for storing sample2's Time elements
new_list=[] #for storing sorted data
for i,j in zip(ReferenceTest,CopyTest):
a.append((i, i.attrib.get("nTimestamp"))) # store data in format [(<Element {http://www.example.org/eHorizon}time at 0x213d738>, '1')
# where 1,2 or 3 is ntimestamp attribute and corresponding parent 'time' element of that attribute
b.append((j, j.attrib.get("nTimestamp"))) # same as above
def sortTimestamps(a,b): #use this function to sort elements in 'b' list in such a manner that they follow sequence of 'a' list
for i in a:
for j in b:
if i[1]==j[1]:
s = a.index(i)
t = b.index(j)
b[t],b[s]=b[s],b[t]
sortTimestamps(a, b) # call sort function
for i in b:
new_list.append(i[0]) # store the sorted timestamps in new_list
CopyTest = new_list # assign new sorted list of time elements to old list
Copy.write("C:/Users/your_location/Desktop/output_data.xml") # write data to another file and check results
上面是根据文件A的顺序执行排序B文件的工作的代码。但是当我将程序写入另一个文件时,它按原样写入文件B的数据。也就是说它以与上面文件B中所示相同的方式写入数据。 排序后我希望修改文件B的数据顺序,它应该按照文件A
中给出的格式写入数据我尝试了什么
除了上面的程序,我尝试阅读更多关于文件写作,但它让我无处可去。我检查了我的xml的格式,我相信这完全没问题。最后我还跟着教程here看看它是如何显示写作的,但这种方法也没有用。也许你们可以帮助我。
编辑:我从链接中删除了代码并将其添加到此处。我之前做过它以防止长篇帖子