我在文件中有类似以下的数据:
Name, Age, Sex, School, height, weight, id
Joe, 10, M, StThomas, 120, 20, 111
Jim, 9, M, StThomas, 126, 22, 123
Jack, 8, M, StFrancis, 110, 15, 145
Abel, 10, F, StFrancis, 128, 23, 166
实际数据可能是100列和100万行。
我要做的是按以下模式创建一个字典:
school_data = {'StThomas': {'weight':[20,22], 'height': [120,126]},
'StFrancis': {'weight':[15,23], 'height': [110,128]} }
我尝试的事情:
试验1 :(计算方面非常昂贵)
school_names = []
for lines in read_data[1:]:
data = lines.split('\t')
school_names.append(data[3])
school_names = set(school_names)
for lines in read_data[1:]:
for school in schools:
if school in lines:
print lines
试用2:
for lines in read_data[1:]:
data = lines.split('\t')
school_name = data[3]
height = data[4]
weight = data[5]
id = data [6]
x[id] = {school_name: (weight, height)}
以上两种方法是我尝试继续进行,但没有接近解决方案。
答案 0 :(得分:1)
在标准库中执行此操作的最简单方法是使用现有工具csv.DictReader
和collections.defaultdict
:
from collections import defaultdict
from csv import DictReader
data = defaultdict(lambda: defaultdict(list)) # *
with open(datafile) as file_:
for row in DictReader(file_):
data[row[' School'].strip()]['height'].append(int(row[' height']))
data[row[' School'].strip()]['weight'].append(int(row[' weight']))
请注意例如' School'
和.strip()
是必需的,因为输入文件的标题行中有空格。结果:
>>> data
defaultdict(<function <lambda> at 0x10261c0c8>, {'StFrancis': defaultdict(<type 'list'>, {'weight': [15, 23], 'height': [110, 128]}), 'StThomas': defaultdict(<type 'list'>, {'weight': [20, 22], 'height': [120, 126]})})
>>> data['StThomas']['height']
[120, 126]
或者,如果您计划进行进一步分析,请查看pandas
及其DataFrame
数据结构等内容。
* 如果这看起来很奇怪,请参阅Python defaultdict and lambda