查找已损坏的命名空间

时间:2016-10-01 09:08:12

标签: python xml xml-namespaces elementtree

我已下载this XML文件。

我试图获得includingNote如下:

...
namespaces = { "skos" : "http://www.w3.org/2004/02/skos/core#", "xml" : "http://www.w3.org/XML/1998/namespace", 
                 "udc" : "http://udcdata.info/udc-schema#" }
...


includingNote = child.find("udc:includingNote[@xml:lang='en']", namespaces)
if includingNote:
  print includingNote.text.encode("utf8")

该计划位于here,似乎已损坏。

有没有办法可以为每个子节点打印includingNote

1 个答案:

答案 0 :(得分:1)

确实,skos前缀未在udc-scheme中声明,但搜索XML文档不是问题。

以下程序提取639 includingNote个元素:

from xml.etree import cElementTree as ET

namespaces = {"udc" : "http://udcdata.info/udc-schema#",
              "xml" : "http://www.w3.org/XML/1998/namespace"}

doc = ET.parse("udcsummary-skos.rdf")
includingNotes = doc.findall(".//udc:includingNote[@xml:lang='en']", namespaces)

print len(includingNotes)   # 639

for i in includingNotes:
    print i.text

请注意在元素名称前使用findall().//以搜索整个文档。

以下是通过首先查找所有Concept元素来返回相同信息的变体:

from xml.etree import cElementTree as ET

namespaces = {"udc" : "http://udcdata.info/udc-schema#",
              "skos" : "http://www.w3.org/2004/02/skos/core#",
              "xml" : "http://www.w3.org/XML/1998/namespace"}

doc = ET.parse("udcsummary-skos.rdf")
concepts = doc.findall(".//skos:Concept", namespaces)

for c in concepts:
    includingNote = c.find("udc:includingNote[@xml:lang='en']", namespaces)
    if includingNote is not None:
        print includingNote.text

请注意is not None的使用。没有它,它就不起作用。这似乎是ElementTree的一个特点。请参阅Why does bool(xml.etree.ElementTree.Element) evaluate to False?