如何使用列表推导从嵌套字典中提取

时间:2013-09-30 16:36:18

标签: python xml list-comprehension

我正在尝试从XML中提取一些数据。我正在使用xmltodict将数据加载到字典中,然后使用列表推导将各个部分提取到单独的列表中。我稍后将使用matplotlib绘制这些。

XML:

<?xml version="1.0" ?>
<MYDATA>
<SESSION ID="1234">
    <INFO>
        <BEGIN LOAD="23"/>
    </INFO>
    <TRANSACTION ID="2103645570">
        <ANSWER>Hello</ANSWER>
    </TRANSACTION>
    <TRANSACTION ID="4315547431">
        <ANSWER>This is an answer</ANSWER>
    </TRANSACTION>
</SESSION>
<SESSION ID="5678">
    <INFO>
        <BEGIN LOAD="28"/>
    </INFO>
    <TRANSACTION ID="4099381642">
        <ANSWER>Hello</ANSWER>
    </TRANSACTION>
    <TRANSACTION ID="1220404184">
        <ANSWER>A Different answer</ANSWER>
    </TRANSACTION>
    <TRANSACTION ID="201506542">
        <ANSWER>Yet another one</ANSWER>
    </TRANSACTION>
</SESSION>
</MYDATA>

我的代码:

from collections import OrderedDict

# doc contains the xml exactly as loaded by xmltodict
doc = OrderedDict([(u'MYDATA', OrderedDict([(u'SESSION', [OrderedDict([(u'@ID', u'1234'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'23')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'2103645570'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'4315547431'), (u'ANSWER', u'This is an answer')])])]), OrderedDict([(u'@ID', u'5678'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'28')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'4099381642'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'1220404184'), (u'ANSWER', u'A Different answer')]), OrderedDict([(u'@ID', u'201506542'), (u'ANSWER', u'Yet another one')])])])])]))])

sess_ids = [i['@ID'] for i in doc['MYDATA']['SESSION']]
print sess_ids

sess_loads = [i['INFO']['BEGIN']['@LOAD'] for i in doc['MYDATA']['SESSION']]
print sess_loads

trans_ids = [[j['@ID'] for j in i['TRANSACTION']] for i in doc['MYDATA']['SESSION']]
print trans_ids

输出:

sess_ids:    [u'1234', u'5678']
sess_loads:  [u'23', u'28']
trans_ids:   [[u'2103645570', u'4315547431'], [u'4099381642', u'1220404184', u'201506542']]

你可以看到我能够从SESSION元素访问ID属性,也可以从BEGIN元素访问LOAD属性。

我需要将TRANSACTION元素的ID属性作为单个列表获取。目前,我正在获取变量trans_ids中的列表列表。

如何才能获得值的平面列表?

我试过了:

[j['@ID'] for j in i['TRANSACTION'] for i in doc['MYDATA']['SESSION']]

但是这只是重复第二次会议两次,给出:

[u'4099381642',
 u'4099381642',
 u'1220404184',
 u'1220404184',
 u'201506542',
 u'201506542']

3 个答案:

答案 0 :(得分:2)

你需要去字典吗?这种事情在XML中相当简单:

import xml.etree.ElementTree as etree
txml = etree.parse('xml string above')
txml.findall('SESSION/TRANSACTION')
[<Element TRANSACTION at 0x4064f9d8>,
 <Element TRANSACTION at 0x4064fa20>,
 <Element TRANSACTION at 0x4064f990>,
 <Element TRANSACTION at 0x4064fa68>,
 <Element TRANSACTION at 0x4064fab0>]
[x.get('ID') for x in txml.findall('SESSION/TRANSACTION')]
['2103645570', '4315547431', '4099381642', '1220404184', '201506542']

至少,对我来说似乎更紧凑。

答案 1 :(得分:1)

  

我试过了:

[j['@ID'] for j in i['TRANSACTION'] for i in doc['MYDATA']['SESSION']]

你几乎拥有它。只需反转内部for..in部分:

>>> [j['@ID'] for i in doc['MYDATA']['SESSION'] for j in i['TRANSACTION']]
[u'2103645570', u'4315547431', u'4099381642', u'1220404184', u'201506542']

要理解这一点,请看一下这个例子:

>>> a = [[1, 2, 3], [4, 5, 6]]
>>> [j for j in i for i in a]
[4, 4, 5, 5, 6, 6]
>>> [j for i in a for j in i]
[1, 2, 3, 4, 5, 6]

当列表理解中有多个for..in部分时,它们将从左到右进行评估。所以如果你看起来像这样:

for i in a:
    for j in i
        j

然后你必须以相同的顺序指定它,而不是从内到外指定它:

[j for i in a for j in i]

答案 2 :(得分:0)

from itertools import chain
list(chain(*trans_ids))