Python:使用lxml + objectify + findall或fromstring获取特定的节点值和属性

时间:2014-07-01 13:03:45

标签: python xml parsing lxml

我从NVD中删除了一部分XML源代码,以下是代码段:

<?xml version='1.0' encoding='UTF-8'?>
<nvd xmlns="http://nvd.nist.gov/feeds/cve/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nvd.nist.gov/feeds/cve/1.2 http://nvd.nist.gov/schema/nvdcve.xsd" pub_date="2014-07-01" nvd_xml_version="1.2">
   <entry CVSS_base_score="6.4" CVSS_exploit_subscore="10.0" CVSS_impact_subscore="4.9" CVSS_score="6.4" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:P/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2011-1381" published="2014-06-27" seq="2011-1381" severity="Medium" type="CVE">
      <desc>
        <descript source="cve">Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.</descript>
      </desc>
   </entry>
   <entry CVSS_base_score="3.5" CVSS_exploit_subscore="6.8" CVSS_impact_subscore="2.9" CVSS_score="3.5" CVSS_vector="(AV:N/AC:M/Au:S/C:P/I:N/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2014-4669" published="2014-06-28" seq="2014-4669" severity="Low" type="CVE">
      <desc>
        <descript source="cve">HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.</descript>
      </desc>
   </entry>
</nvd>

正如本问题的标题和上面的相关片段所述,我只想获得&#39;描述&#39;的价值和归属。节点即可。我尝试使用findall方法,但它返回一个空列表:

root = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
root.findall('entry')

返回:

[]

当我打印根标签时,这是它返回的内容:

'{http://nvd.nist.gov/feeds/cve/1.2}nvd'

我还尝试打印直接父母及其子女的标签:

for e in root.iterchildren():
print "Immediate parent : %s" % e.tag
children = e.getchildren()
for c in children : print "\t\tchildren : %s" % c.tag

以下是它的回报:

Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc
Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc

同样,我只想获得&#39;描述&#39;的内容和价值。节点。 任何想法都非常感谢。提前谢谢!

1 个答案:

答案 0 :(得分:1)

您需要在xpath表达式中添加名称空间前缀:

tree = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
for descript in tree.xpath('//ns:entry/ns:desc/ns:descript', namespaces={'ns': 'http://nvd.nist.gov/feeds/cve/1.2'}):
    print descript.text
    print descript.attrib.get('source')

打印:

Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.
cve
HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.
cve

另见相关主题: