Question

我在文件中有类似以下的数据：

Name, Age, Sex, School, height, weight, id

Joe, 10, M, StThomas, 120, 20, 111

Jim, 9, M, StThomas, 126, 22, 123

Jack, 8, M, StFrancis, 110, 15, 145

Abel, 10, F, StFrancis, 128, 23, 166

实际数据可能是100列和100万行。

我要做的是按以下模式创建一个字典：

school_data = {'StThomas': {'weight':[20,22], 'height': [120,126]},
               'StFrancis': {'weight':[15,23], 'height': [110,128]} }

我尝试的事情：

试验1 :(计算方面非常昂贵）

school_names  = []
for lines in read_data[1:]:
    data = lines.split('\t')
    school_names.append(data[3])

school_names = set(school_names)

for lines in read_data[1:]:
    for school in schools:
        if school in lines:
            print lines

试用2：

for lines in read_data[1:]:
    data = lines.split('\t')
    school_name = data[3]
    height = data[4]
    weight = data[5]
    id = data [6]
    x[id] = {school_name: (weight, height)}

以上两种方法是我尝试继续进行，但没有接近解决方案。

Answer 1

在标准库中执行此操作的最简单方法是使用现有工具csv.DictReader和collections.defaultdict：

from collections import defaultdict
from csv import DictReader

data = defaultdict(lambda: defaultdict(list))  # *

with open(datafile) as file_:
    for row in DictReader(file_):
        data[row[' School'].strip()]['height'].append(int(row[' height']))
        data[row[' School'].strip()]['weight'].append(int(row[' weight']))

请注意例如' School'和.strip()是必需的，因为输入文件的标题行中有空格。结果：

>>> data
defaultdict(<function <lambda> at 0x10261c0c8>, {'StFrancis': defaultdict(<type 'list'>, {'weight': [15, 23], 'height': [110, 128]}), 'StThomas': defaultdict(<type 'list'>, {'weight': [20, 22], 'height': [120, 126]})})
>>> data['StThomas']['height']
[120, 126]

或者，如果您计划进行进一步分析，请查看pandas及其DataFrame数据结构等内容。

* 如果这看起来很奇怪，请参阅Python defaultdict and lambda

数据提取：使用python中的列表创建字典字典

1 个答案: