Python嵌套结构

时间:2017-05-11 17:13:24

标签: python list dictionary data-structures

我正在从Perl转向Python,我正在努力解决数组哈希的问题。 我从REST服务返回此数据结构:

[
    {
      "gene": "ENSG00000270076", 
      "minus_log10_p_value": 0.0271298550085406, 
      "tissue": "Thyroid", 
      "value": 0.939442373223424
    },
    {
      "gene": "ENSG00000104643", 
      "minus_log10_p_value": 0.255628260060896, 
      "tissue": "Thyroid", 
      "value": 0.555100655197016
    }
]

在Perl中说,我想解析它并使Python等效于

${$tissue}{$value} = [$gene]
${Throid}{0.5555} = [ENSG1, ENSG2, ENSG3]

在Python中,我尝试了一些方法:

d={}
d[hit['tissue']][hit['value']].append(hit[gene])

但遇到了各种错误。

最后,我希望d看起来像:

{
    'Thyroid': {
        0.939442373223424: ['ENSG00000270076'],
        0.555100655197016: ['ENSG00000104643']
    }
}

按组织分组,然后按值分组,每个值都有一个基因列表。

3 个答案:

答案 0 :(得分:2)

您可以使用dict.setdefault() method为缺少的键插入嵌套数据结构。由于该方法返回已存在的密钥或新插入的默认值,因此您可以链接这些调用:

d = {}
for hit in list_of_hits:
    tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
    d.setdefault(tissue, {}).setdefault(value, []).append(gene)

因此,对于每个d[tissue]键,请确保存在嵌套字典。对于每个d[tissue][value]对密钥,请确保存在嵌套列表值,并将基因追加到该值。

演示:

>>> list_of_hits = [
...     {
...       "gene": "ENSG00000270076",
...       "minus_log10_p_value": 0.0271298550085406,
...       "tissue": "Thyroid",
...       "value": 0.939442373223424
...     },
...     {
...       "gene": "ENSG00000104643",
...       "minus_log10_p_value": 0.255628260060896,
...       "tissue": "Thyroid",
...       "value": 0.555100655197016
...     }
... ]
>>> d = {}
>>> for hit in list_of_hits:
...     tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
...     d.setdefault(tissue, {}).setdefault(value, []).append(gene)
...
>>> d
{'Thyroid': {0.939442373223424: ['ENSG00000270076'], 0.555100655197016: ['ENSG00000104643']}}
>>> from pprint import pprint
>>> pprint(d)
{'Thyroid': {0.555100655197016: ['ENSG00000104643'],
             0.939442373223424: ['ENSG00000270076']}}

要意识到浮点值可能不精确。您可能需要应用一些舍入来规范化值。例如,0.5551006551970160.555100655197017非常接近,但不等于

>>> 0.555100655197016 == 0.555100655197017
False

您可以简单地使用value上的round() function,使用对您的应用仍有意义的数字:

d = {}
for hit in list_of_hits:
    tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
    value = round(value, 4)
    d.setdefault(tissue, {}).setdefault(value, []).append(gene)

答案 1 :(得分:1)

您可以使用列表推导来获得所需格式的输出!

>>> l = [{'minus_log10_p_value': 0.0271298550085406, 'gene': 'ENSG00000270076', 'tissue': 'Thyroid', 'value': 0.939442373223424}, {'minus_log10_p_value': 0.255628260060896, 'gene': 'ENSG00000104643', 'tissue': 'Thyroid', 'value': 0.555100655197016}]
>>> for each in l:
...     if each['tissue'] not in res:
...             res[each['tissue']]={each['value']:each['gene']}
...     else:
...             res[each['tissue']][each['value']]=each['gene']
... 
>>> res
{'Thyroid': {0.555100655197016: 'ENSG00000104643', 0.939442373223424: 'ENSG00000270076'}}

答案 2 :(得分:0)

我认为这可以胜任。事实上你几乎已经完成了它

yyyy-mm-ddThh:00:00