我有一个如下所示的CSV文件:
compound, x1data,y1data,x2data,y2data
a,1,2,3,4
a,9,10,11,12
b,5,6,7,8
b,4,5,6,7
我想创建一个列表字典,其中化合物是关键,对于每个化合物,我都会得到x1data,y1data,x2data和y2data的列表。
我相信它看起来像这样:
my_dict = {
'a': {'x1data':[1,9],'y1data':[2,10],'x2data':[3,11],'y2data':[4,12]},
'b':{'x1data':[5,4],'y1data':[6,5],'x2data':[7,6],'y2data':[8,7]}
}
最终,我想绘制每种化合物的x1data vs y1data和x2data vs y2data。
我已经尝试过此方法,可以正确地制作一个字典,其中的键是复合键,但是它没有给我值列表(只是csv中的最后一个值。
my_dict = {}
with open(filename, 'r') as infile:
reader = csv.DictReader(infile)
for row in reader:
key = row.pop('compound')
my_dict[key] = row
答案 0 :(得分:0)
这是一种无需任何库即可实现的方法。
f = open('f.csv', 'rb')
next(f)
mydict = {}
for row in f:
compound,x1data,y1data,x2data,y2data = row.strip().split(',')
x1data,y1data,x2data,y2data = int(x1data),int(y1data),int(x2data),int(y2data)
if compound not in mydict:
mydict[compound] = { 'x1data' : [], 'y1data' : [], 'x2data' : [], 'y2data' : [] }
mydict[compound]['x1data'].append(x1data)
mydict[compound]['y1data'].append(y1data)
mydict[compound]['x2data'].append(x2data)
mydict[compound]['y2data'].append(y2data)
}
f.close()
print mydict
给您
{'a': {'x2data': [3, 11], 'y2data': [4, 12], 'y1data': [2, 10], 'x1data': [1, 9]}, 'b': {'x2data': [7, 6], 'y2data': [8, 7], 'y1data': [6, 5], 'x1data': [5, 4]}}
答案 1 :(得分:0)
您可以使用标准库中的collections.defaultdict
。
from collections import defaultdict as dd
import csv
my_dict = dd(lambda: dd(list))
with open("test.csv", 'r') as f:
reader = csv.DictReader(f)
for row in reader:
for key in reader.fieldnames[1:]:
my_dict[row.get("compound")][key].append(row[key])
从技术上讲,您获得的不是dict
。您可以用相同的方式使用它。
如果您要打印,则涉及到更多:
from pprint import pprint
# ...
pprint({k: dict(v) for k, v in dict(my_dict).items()})
这给出了:
{'a': {'x1data': ['1', '9'],
'x2data': ['3', '11'],
'y1data': ['2', '10'],
'y2data': ['4', '12']},
'b': {'x1data': ['5', '4'],
'x2data': ['7', '6'],
'y1data': ['6', '5'],
'y2data': ['8', '7']}}
答案 2 :(得分:0)
这是一种不依赖于csv库的解决方案,应该与任意大小的标头一起使用。
with open("dat.csv", 'r') as f:
lines = f.read().splitlines()
headers = lines.pop(0).split(",")[1:] # names of the columns
results = {}
for line in lines:
line = line.split(",")
if line[0] not in results:
results[line[0]] = {header:[] for header in headers}
for i, header in enumerate(headers):
results[line[0]][header].append(line[i+1])
# for ints: results[line[0]][header].append(int(line[i+1]))
print(results)
输出:
{'a': {'x2data': ['3', '11'], 'y2data': ['4', '12'], 'y1data': ['2', '10'], 'x1data': ['1', '9']}, 'b': {'x2data': ['7', '6'], 'y2data': ['8', '7'], 'y1data': ['6', '5'], 'x1data': ['5', '4']}}
我所做的唯一更改是在提供的标头中删除了一个空格(无论哪种方式都可以工作)。
答案 3 :(得分:0)
您可以使用itertools.groupby
:
import csv, itertools
[_, *hs], *data = csv.reader(open('filename.csv'))
r = [(a, [list(map(int, i[1:])) for i in b]) for a, b in itertools.groupby(data, key=lambda x:x[0])]
final_result = {a:dict(zip(hs, map(list, zip(*b)))) for a, b in r}
输出:
{'a': {'x1data': [1, 9], 'y1data': [2, 10], 'x2data': [3, 11], 'y2data': [4, 12]}, 'b': {'x1data': [5, 4], 'y1data': [6, 5], 'x2data': [7, 6], 'y2data': [8, 7]}}