我正在从Perl转向Python,我正在努力解决数组哈希的问题。 我从REST服务返回此数据结构:
[
{
"gene": "ENSG00000270076",
"minus_log10_p_value": 0.0271298550085406,
"tissue": "Thyroid",
"value": 0.939442373223424
},
{
"gene": "ENSG00000104643",
"minus_log10_p_value": 0.255628260060896,
"tissue": "Thyroid",
"value": 0.555100655197016
}
]
在Perl中说,我想解析它并使Python等效于
${$tissue}{$value} = [$gene]
${Throid}{0.5555} = [ENSG1, ENSG2, ENSG3]
在Python中,我尝试了一些方法:
d={}
d[hit['tissue']][hit['value']].append(hit[gene])
但遇到了各种错误。
最后,我希望d
看起来像:
{
'Thyroid': {
0.939442373223424: ['ENSG00000270076'],
0.555100655197016: ['ENSG00000104643']
}
}
按组织分组,然后按值分组,每个值都有一个基因列表。
答案 0 :(得分:2)
您可以使用dict.setdefault()
method为缺少的键插入嵌套数据结构。由于该方法返回已存在的密钥或新插入的默认值,因此您可以链接这些调用:
d = {}
for hit in list_of_hits:
tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
d.setdefault(tissue, {}).setdefault(value, []).append(gene)
因此,对于每个d[tissue]
键,请确保存在嵌套字典。对于每个d[tissue][value]
对密钥,请确保存在嵌套列表值,并将基因追加到该值。
演示:
>>> list_of_hits = [
... {
... "gene": "ENSG00000270076",
... "minus_log10_p_value": 0.0271298550085406,
... "tissue": "Thyroid",
... "value": 0.939442373223424
... },
... {
... "gene": "ENSG00000104643",
... "minus_log10_p_value": 0.255628260060896,
... "tissue": "Thyroid",
... "value": 0.555100655197016
... }
... ]
>>> d = {}
>>> for hit in list_of_hits:
... tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
... d.setdefault(tissue, {}).setdefault(value, []).append(gene)
...
>>> d
{'Thyroid': {0.939442373223424: ['ENSG00000270076'], 0.555100655197016: ['ENSG00000104643']}}
>>> from pprint import pprint
>>> pprint(d)
{'Thyroid': {0.555100655197016: ['ENSG00000104643'],
0.939442373223424: ['ENSG00000270076']}}
要意识到浮点值可能不精确。您可能需要应用一些舍入来规范化值。例如,0.555100655197016
和0.555100655197017
非常接近,但不等于:
>>> 0.555100655197016 == 0.555100655197017
False
您可以简单地使用value
上的round()
function,使用对您的应用仍有意义的数字:
d = {}
for hit in list_of_hits:
tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
value = round(value, 4)
d.setdefault(tissue, {}).setdefault(value, []).append(gene)
答案 1 :(得分:1)
您可以使用列表推导来获得所需格式的输出!
>>> l = [{'minus_log10_p_value': 0.0271298550085406, 'gene': 'ENSG00000270076', 'tissue': 'Thyroid', 'value': 0.939442373223424}, {'minus_log10_p_value': 0.255628260060896, 'gene': 'ENSG00000104643', 'tissue': 'Thyroid', 'value': 0.555100655197016}]
>>> for each in l:
... if each['tissue'] not in res:
... res[each['tissue']]={each['value']:each['gene']}
... else:
... res[each['tissue']][each['value']]=each['gene']
...
>>> res
{'Thyroid': {0.555100655197016: 'ENSG00000104643', 0.939442373223424: 'ENSG00000270076'}}
答案 2 :(得分:0)
我认为这可以胜任。事实上你几乎已经完成了它
yyyy-mm-ddThh:00:00