来自csv python的嵌套字典

时间:2014-07-28 21:10:17

标签: python csv dictionary

我有这个数据集:

Epitope,ID,Frequency,Assay
AVNIVGYSNAQGVDY,123431,27.0,Tetramer
DIKYTWNVPKI,887473,50.0,3H
LRQMRTVTPIRMQGG,34234,11.9,Elispot
AVNIVGYSNAQGVDY,3456,67.0,Tetramer

我想知道如何获取和输出这样的

d = {'AVNIVGYSNAQGVDY': [ID[123431,3456],Frequency[27.0,67.0],Assay['Tetramer']], 'DIKYTWNVPKI': [ID[887473],Frequency[50.0],Assay['3H']], 'LRQMRTVTPIRMQGG': [ID[34234],Frequency[11.9],Assay['Elispot']]}

这使得每个唯一的表位作为关键字的字典,它们的值列表,每个类别ID,频率和分析作为一个列表,其中包含重复的值,如您所见。

我可以使用以下代码阅读文件:

result = {}
for row in reader:
    dictlist = []
    key = row.pop('Epitope')
    if key in result:
        pass
    result[key] = row
print result

但我不知道如何处理重复项,我的意思是,如果有重复项,如何附加ID,频率和分析。

1 个答案:

答案 0 :(得分:1)

您需要将列表用作值并附加到每个列表中,每行按键:

from collections import defaultdict

result = defaultdict(lambda: defaultdict(list))

for row in reader:
    epitope = row.pop('Epitope')
    entry = result[epitope]
    for key, value in row.items():
        entry[key].append(value)

演示:

>>> from collections import defaultdict
>>> import csv
>>> from collections import defaultdict
>>> sample = '''\
... Epitope,ID,Frequency,Assay
... AVNIVGYSNAQGVDY,123431,27.0,Tetramer
... DIKYTWNVPKI,887473,50.0,3H
... LRQMRTVTPIRMQGG,34234,11.9,Elispot
... AVNIVGYSNAQGVDY,3456,67.0,Tetramer
... '''
>>> reader = csv.DictReader(sample.splitlines())
>>> result = defaultdict(lambda: defaultdict(list))
>>> for row in reader:
...     epitope = row.pop('Epitope')
...     entry = result[epitope]
...     for key, value in row.items():
...         entry[key].append(value)
... 
>>> from pprint import pprint
>>> for key, value in result.items():
...     print key, dict(value)
... 
AVNIVGYSNAQGVDY {'Frequency': ['27.0', '67.0'], 'Assay': ['Tetramer', 'Tetramer'], 'ID': ['123431', '3456']}
DIKYTWNVPKI {'Frequency': ['50.0'], 'Assay': ['3H'], 'ID': ['887473']}
LRQMRTVTPIRMQGG {'Frequency': ['11.9'], 'Assay': ['Elispot'], 'ID': ['34234']}