我有一个大型XML文件,我希望从中获取条目的特定值到一个新的XML文件中(这将减小从大型XML文件到新XML文件的大小)
现在我写上面的代码:
#!/usr/bin/env python
from lxml import etree
dst = "/root/Downloads/lilmalxml.xml"
dsttemp = "/root/Downloads/malxml.xml"
filehandler = etree.parse(dsttemp)
rootElement = etree.Element("list")
doc = etree.ElementTree(rootElement)
for entry in filehandler.xpath("///entry"):
virusname = entry.find("virusname").text
url = entry.find("url").text
ip = entry.find("ip").text
formatstr = "<url ip='{0}' virusname='{1}'><![CDATA[{2}]]></url>".format(ip,virusname,url)
sub = etree.fromstring(formatstr)
rootElement.append(sub)
f = open(dst,"w")
doc.write(f,encoding='utf-8',method='xml')
f.close()
print "FINISHED"
行rootElement.append(sub)
附加新条目,但它与创建的每个新子的相同实例相同。 (它一次又一次地覆盖元素)
顺便说一句,大型XML条目看起来像这样
<entry>
<line>1</line>
<id>88871349</id>
<first>1456571225</first>
<last>0</last>
<md5></md5>
<virustotal>http://www.virustotal.com/latest-report.html?resource=b637e57062280a903ee05f397feeea24</virustotal>
<vt_score>27/56 (48.2%)</vt_score>
<scanner>undef</scanner>
<virusname><![CDATA[Generic7.RAL]]></virusname>
<url><![CDATA[http://xiazaiqi2.xpgod.com/down/yy%20at%20134_1062.exe]]></url>
<recent>up</recent>
<response>alive</response>
<ip>222.186.130.206</ip>
<as>AS4134</as>
<review>61.164.110.151</review>
<domain>xpgod.com</domain>
<country>CN</country>
<source>APNIC</source>
<email>anti_spam@wz.zj.cn</email>
<inetnum>61.164.108.0 - 61.164.111.255</inetnum>
<netname>RUIAN-TELECOM</netname>
<descr><![CDATA[Ruian Telecom]]></descr>
<ns1>ns3.dnsv3.com</ns1>
<ns2>ns4.dnsv3.com</ns2>
<ns3></ns3>
<ns4></ns4>
<ns5></ns5>
</entry>
您是否有其他方式可以添加条目,但格式与我相同?