基本的Python XML数据提取

时间:2013-11-19 22:47:17

标签: python lxml

过去几天我一直在经历严重的脑屁,虽然我确信几个月前我可以做到这一点,但我完全不知道如何从这个输出中提取数据元素; :(

Grp = flickr.groups_getInfo( group_id = gid )返回

Grp =

<group id="34427465497@N01" iconserver="1" iconfarm="1" lang="en-us" ispoolmoderated="0">
    <name>GNEverybody</name>
    <description>The group for GNE players</description>
    <members>69</members>
    <privacy>3</privacy>
    <throttle count="10" mode="month" remaining="3" />
    <restrictions photos_ok="1" videos_ok="1" images_ok="1" screens_ok="1" art_ok="1" safe_ok="1" moderate_ok="0" restricted_ok="0" has_geo="0" />
</group>

要提取单个数据元素,应该是:

group_id = Grp.id
group_name = Grp.name
safe_ok = Grp.restrictions.safe_ok

等等?

3 个答案:

答案 0 :(得分:2)

从这个项目中提取元素需要稍微冗长的语法。

>>> from lxml import html, etree
>>> example = etree.fromstring("""
<group id="34427465497@N01" iconserver="1" iconfarm="1" lang="en-us" ispoolmoderated="0">
    <name>GNEverybody</name>
    <description>The group for GNE players</description>
    <members>69</members>
    <privacy>3</privacy>
    <throttle count="10" mode="month" remaining="3" />
    <restrictions photos_ok="1" videos_ok="1" images_ok="1" screens_ok="1" art_ok="1" safe_ok="1" moderate_ok="0" restricted_ok="0" has_geo="0" />
</group>
""")

# Attributes can be accessed in two ways:
>>> example.attrib  # Returns a dictionary of key, value pairs
{'iconserver': '1', 'lang': 'en-us', 'ispoolmoderated': '0', 'id': '34427465497@N01', 'iconfarm': '1'}
>>> example.get('id')  # Grabs a specific key in the attribs dict.
'34427465497@N01'

# Children elements are accessed using the getchildren() method:
>>> example.getchildren()  # Returns a list of items.
[<Element name at 0x1007c7140>, <Element description at 0x1007c7190>, <Element members at 0x1007c71e0>, <Element privacy at 0x1007c7230>, <Element throttle at 0x1007c7280>, <Element restrictions at 0x1007c72d0>]

另一种提取子项的方法是使用xpath:

>>> example.xpath(u'//description')  # returns a list of elements which matched the tag name.
[<Element description at 0x1004d82d0>]

访问Element描述的项目就像父节点一样:

>>> desc = example.xpath(u'//description')
>>> desc[0].tag
'description'
>>> desc[0].attrib  # This node has no attributes.
{}

其他项目可能包含属性:

>>> example.xpath(u'//restrictions')[0].attrib
{'photos_ok': '1', 'images_ok': '1', 'safe_ok': '1', 'has_geo': '0', 'screens_ok': '1', 'videos_ok': '1', 'moderate_ok': '0', 'restricted_ok': '0', 'art_ok': '1'}

请查看dir(example)以获取可在lxml.etree.Element上使用的完整方法列表。

答案 1 :(得分:2)

简单XML

在这个问题上有解决方案:

同一问题

XML parsing in Python

<强>参考

http://docs.python.org/2/library/xml.dom.minidom

答案 2 :(得分:2)

@VooDooNOFX答案的变体是使用lxml.objectify

>>> group = lxml.objectify.fromstring("""<group id="34427465497@N01" iconserver="1" iconfarm="1" lang="en-us" ispoolmoderated="0">
...     <name>GNEverybody</name>
...     <description>The group for GNE players</description>
...     <members>69</members>
...     <privacy>3</privacy>
...     <throttle count="10" mode="month" remaining="3" />
...     <restrictions photos_ok="1" videos_ok="1" images_ok="1" screens_ok="1" art_ok="1" safe_ok="1" moderate_ok="0" restricted_ok="0" has_geo="0" />
... </group>""")
>>> group.get("id")
'34427465497@N01'
>>> group.name
'GNEverybody'
>>> group.restrictions.get("safe_ok")
'1'
>>>