数据提取:使用python中的列表创建字典字典

时间:2016-10-07 13:53:14

标签: python dictionary

我在文件中有类似以下的数据:

Name, Age, Sex, School, height, weight, id

Joe, 10, M, StThomas, 120, 20, 111

Jim, 9, M, StThomas, 126, 22, 123

Jack, 8, M, StFrancis, 110, 15, 145

Abel, 10, F, StFrancis, 128, 23, 166

实际数据可能是100列和100万行。

我要做的是按以下模式创建一个字典:

school_data = {'StThomas': {'weight':[20,22], 'height': [120,126]},
               'StFrancis': {'weight':[15,23], 'height': [110,128]} }

我尝试的事情:

  1. 试验1 :(计算方面非常昂贵)

    school_names  = []
    for lines in read_data[1:]:
        data = lines.split('\t')
        school_names.append(data[3])
    
    school_names = set(school_names)
    
    for lines in read_data[1:]:
        for school in schools:
            if school in lines:
                print lines
    
  2. 试用2:

    for lines in read_data[1:]:
        data = lines.split('\t')
        school_name = data[3]
        height = data[4]
        weight = data[5]
        id = data [6]
        x[id] = {school_name: (weight, height)}
    
  3. 以上两种方法是我尝试继续进行,但没有接近解决方案。

1 个答案:

答案 0 :(得分:1)

在标准库中执行此操作的最简单方法是使用现有工具csv.DictReadercollections.defaultdict

from collections import defaultdict
from csv import DictReader

data = defaultdict(lambda: defaultdict(list))  # *

with open(datafile) as file_:
    for row in DictReader(file_):
        data[row[' School'].strip()]['height'].append(int(row[' height']))
        data[row[' School'].strip()]['weight'].append(int(row[' weight']))

请注意例如' School'.strip()是必需的,因为输入文件的标题行中有空格。结果:

>>> data
defaultdict(<function <lambda> at 0x10261c0c8>, {'StFrancis': defaultdict(<type 'list'>, {'weight': [15, 23], 'height': [110, 128]}), 'StThomas': defaultdict(<type 'list'>, {'weight': [20, 22], 'height': [120, 126]})})
>>> data['StThomas']['height']
[120, 126]

或者,如果您计划进行进一步分析,请查看pandas及其DataFrame数据结构等内容。

* 如果这看起来很奇怪,请参阅Python defaultdict and lambda