使用libxml2在Python中检索元素的属性

时间:2013-09-13 05:28:51

标签: python libxml2

我正在使用libxml2编写我的第一个Python脚本来从XML文件中检索数据。该文件如下所示:

<myGroups1>
<myGrpContents name="ABC" help="abc_help">
     <myGrpKeyword name="abc1" help="help1"/>
     <myGrpKeyword name="abc2" help="help2"/>
     <myGrpKeyword name="abc3" help="help3"/>
</myGrpContents>
</myGroups1>

文件中有许多类似的组。我的目的是获取属性“name”和“help”,并将它们以不同的格式放入另一个文件中。但是我只能使用以下代码检索到myGroups1元素。

doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
while child is not None:
    if not child.isBlankNode():
        if child.type == "element":
            print "\t Element ", child.name, " with ", child.lsCountNode(), "child(ren)"
            print "\t and content ", repr(child.content)
    child = child.next

如何更深入地迭代元素并获取属性?对此的任何帮助都将深表感谢。

2 个答案:

答案 0 :(得分:0)

没有使用过libxml2,但潜入了案例并发现了这个,

尝试其中之一,

if child.type == "element":
    if child.name == "myGrpKeyword":
        print child.prop('name')
        print child.prop('help')

if child.type == "element":
    if child.name == "myGrpKeyword":
        for property in child.properties:
            if property.type=='attribute':
                # check what is the attribute 
                if property.name == 'name':
                    print property.content
                if property.name == 'help':
                    print property.content

参考http://ukchill.com/technology/getting-started-with-libxml2-and-python-part-1/

更新

尝试递归函数

def explore(child):     
    while child is not None:
        if not child.isBlankNode():
            if child.type == "element":
                print element.prop('name')
                print element.prop('help')
                explore(child.children)
        child = child.next
doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
explore(child)

答案 1 :(得分:0)

python. how to get attribute value with libxml2可能是您正在寻找的答案。

当遇到这样的问题时,当我因某些原因而不想阅读文档时,以这种方式交互式地探索图书馆会很有帮助 - 我建议你使用交互式python repl(我喜欢bpython)来试试这个。这是我提出解决方案的会议:

>>> import libxml2
>>> xml = """<myGroups1>
... <myGrpContents name="ABC" help="abc_help">
...      <myGrpKeyword name="abc1" help="help1"/>
...      <myGrpKeyword name="abc2" help="help2"/>
...      <myGrpKeyword name="abc3" help="help3"/>
... </myGrpContents>
... </myGroups1>"""
>>> tree = libxml2.parseMemory(xml, len(xml)) # I found this method by looking through `dir(libxml2)`
>>> tree.children
<xmlNode (myGroups1) object at 0x10aba33b0>
>>> a = tree.children
>>> a
<xmlNode (myGroups1) object at 0x10a919ea8>
>>> a.children
<xmlNode (text) object at 0x10ab24368>
>>> a.properties
>>> b = a.children
>>> b.children
>>> b.properties
>>> b.next
<xmlNode (myGrpContents) object at 0x10a921290>
>>> b.next.content
'\n     \n     \n     \n'
>>> b.next.next.content
'\n'
>>> b.next.next.next.content
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'content'
>>> b.next.next.next
>>> b.next.properties
<xmlAttr (name) object at 0x10aba32d8>
>>> b.next.properties.children
<xmlNode (text) object at 0x10ab40f38>
>>> b.next.properties.children.content
'ABC'
>>> b.next.properties.children.name
'text'
>>> b.next.properties.next
<xmlAttr (help) object at 0x10ab40fc8>
>>> b.next.properties.next.name
'help'
>>> b.next.properties.next.content
'abc_help'
>>> list(tree)
[<xmlDoc (None) object at 0x10a921248>, <xmlNode (myGroups1) object at 0x10aba32d8>, <xmlNode (text) object at 0x10aba3878>, <xmlNode (myGrpContents) object at 0x10aba3d88>, <xmlNode (text) object at 0x10aba3950>, <xmlNode (myGrpKeyword) object at 0x10aba3758>, <xmlNode (text) object at 0x10aba3320>, <xmlNode (myGrpKeyword) object at 0x10aba3f38>, <xmlNode (text) object at 0x10aba3560>, <xmlNode (myGrpKeyword) object at 0x10aba3998>, <xmlNode (text) object at 0x10aba33f8>, <xmlNode (text) object at 0x10aba38c0>]
>>> good = list(tree)[5]
>>> good.properties
<xmlAttr (name) object at 0x10aba35f0>
>>> good.prop('name')
'abc1'
>>> good.prop('help')
'help1'
>>> good.prop('whoops')
>>> good.hasProp('whoops')
>>> good.hasProp('name')
<xmlAttr (name) object at 0x10ab40ef0>
>>> good.hasProp('name').content
'abc1'
>>> for thing in tree:
...     if thing.hasProp('name') and thing.hasProp('help'):
...         print thing.prop('name'), thing.prop('help')
...         
...     
... 
ABC abc_help
abc1 help1
abc2 help2
abc3 help3

因为它是bpython,我有点作弊 - 这是一个倒带键,所以我输错了比这更多,但不然这很接近。