我在下面发布了一个工作正常的代码。它目前的作用是:
它会打开2个.csv
个文件'CMF.csv'和'D65.csv',然后
对它进行一些数学计算。
以下是这些文件的简单结构:
'CMF.csv'(波长,x,y,z)
400,1.879338E-02,2.589775E-03,8.508254E-02
410,8.277331E-02,1.041303E-02,3.832822E-01
420,2.077647E-01,2.576133E-02,9.933444E-01
...etc
'D65.csv':(波长,a,b)
400,82.7549,14.708
410,91.486,17.6753
420,93.4318,20.995
...etc
我有第3个文件data.csv
,具有这种结构(serialNumber,波长,测量,名称):
0,400,2.21,1
0,410,2.22,1
0,420,2.22,1
...
1,400,2.21,2
1,410,2.22,2
1,420,2.22,2
...etc
我想做的是能够编写几行代码来执行 最后一个文件的所有系列的数学(系列由它们的序列号和名称定义)
例如,我需要一个循环,对每个名称或序列号以及每个波长执行操作:
x * a * measurement
我试图在csv阅读器中加载data.csv`作为其他文件,但我无法
任何想法?
由于
import csv
with open('CMF.csv') as cmf:
reader = csv.reader(cmf)
dict_cmf = dict()
for row in reader:
dict_cmf[float(row[0])] = row
with open('D65.csv') as d65:
reader = csv.reader(d65)
dict_d65 = dict()
for row in reader:
dict_d65[float(row[0])] = row
with open('data.csv') as sp:
reader = csv.reader(sp)
dict_sp = dict()
for row in reader:
dict_sp[float(row[0])] = row
Y = 0
Y_total = 0
X = 0
X_total = 0
Z = 0
Z_total = 0
i = 0
j = 0
for i in range(400, 700, i+10):
X = float(dict_cmf[i][1]) * float(dict_d65[i][1])
X_total = X_total + X
Y = float(dict_cmf[i][2]) * float(dict_d65[i][1])
Y_total = Y_total + Y
Z = float(dict_cmf[i][3]) * float(dict_d65[i][1])
Z_total = Z_total + Z
wp_X = 100 * X_total / Y_total
wp_Y = 100 * Y_total / Y_total
wp_Z = 100 * Z_total / Y_total
print Y_total
print "D65_CMF_2006_10_deg white point = "
print wp_X, wp_Y, wp_Z
我明白了:
Traceback (most recent call last): File "C:\Users\gary\Documents\eclipse\Spectro\1illum_XYZ2006_D65_numpy.py", line 24, in <module> dict_sp[row[0]] = row IndexError: list index out of range
答案 0 :(得分:1)
data.csv
中的一行或多行不包含您的想法。尝试将您的语句置于try ... except块中以查看问题所在:
with open('spectral_data.csv') as sp:
reader = csv.reader(sp)
dict_sp = dict()
for row in reader:
try:
dict_sp[float(row[0])] = row
except IndexError as e:
print 'The problematic row is:'
print row
raise e
适当的调试器在这种情况下也会有所帮助。
pandas可能是更好的方法,但如果你想要一个vanilla Python的例子,你可以看看这个例子:
import csv
from collections import defaultdict
d = defaultdict(dict)
for fname, cols in [('CMF.csv', ('x', 'y', 'z')), ('D65.csv', ('a', 'b'))]:
with open(fname) as ifile:
reader = csv.reader(ifile)
for row in reader:
wl, values = int(row[0]), row[1:]
d[wl].update(zip(cols, map(float, values)))
measurements = defaultdict(dict)
with open('data.csv') as ifile:
reader = csv.reader(ifile)
cols = ('measurement', 'name')
for serial, wl, me, name in reader:
measurements[int(serial)][int(wl)] = dict(zip(cols, (float(me), str(name))))
for serial in sorted(measurements.keys()):
for wl in sorted(measurements[serial].keys()):
me = measurements[serial][wl]['measurement']
print me * d[wl]['x'] * d[wl]['a']
这将x,y,z,a和b存储在以波长为关键字的字典内的字典中(没有明显的理由将这些值存储在单独的dicts中)。
测量结果存储在带有serial
和wavelength
键的二级深层词典中。这样,您可以迭代所有序列和所有相应的波长,如代码的后半部分所示。
至于您对示例中数据的具体计算,使用此结构可以很容易地完成此任务:
tot_x = sum(v['x']*v['a'] for v in data.values())
tot_y = sum(v['y']*v['a'] for v in data.values())
tot_z = sum(v['z']*v['a'] for v in data.values())
wp_x = 100 * tot_x / tot_y
wp_y = 100 * tot_y / tot_y # Sure this is correct? It will always be 100
wp_z = 100 * tot_z / tot_y
print wp_x, wp_y, wp_z # 798.56037811 100.0 3775.04316468
这些是您问题中输入文件的词典:
>>> from pprint import pprint
>>> pprint(dict(data))
{400: {'a': 82.7549,
'b': 14.708,
'x': 0.01879338,
'y': 0.002589775,
'z': 0.08508254},
410: {'a': 91.486,
'b': 17.6753,
'x': 0.08277331,
'y': 0.01041303,
'z': 0.3832822},
420: {'a': 93.4318,
'b': 20.995,
'x': 0.2077647,
'y': 0.02576133,
'z': 0.9933444}}
>>> pprint(dict(measurements))
{0: {400: {'measurement': 2.21, 'name': '1'},
410: {'measurement': 2.22, 'name': '1'},
420: {'measurement': 2.22, 'name': '1'}},
1: {400: {'measurement': 2.21, 'name': '2'},
410: {'measurement': 2.22, 'name': '2'},
420: {'measurement': 2.22, 'name': '2'}}}
答案 1 :(得分:1)
你需要大熊猫。您可以将文件读入pandas表,然后将它们连接起来以使用以下代码替换您的代码:
import pandas
cmf = pandas.read_csv('CMF.csv', names=['wavelength', 'x', 'y', 'z'])
d65 = pandas.read_csv('D65.csv', names=['wavelength', 'a', 'b'])
data = pandas.read_csv('data.csv', names=['serialNumber', 'wavelength', 'measurement', 'name'])
lookup = pandas.merge(cmf, d65, on='wavelength')
merged = pandas.merge(data, lookup, on='wavelength')
totals = ((lookup[['x', 'y', 'z']].T*lookup['a']).T).sum()
wps = totals/totals['y']
print totals['y']
print "D65_CMF_2006_10_deg white point = "
print wps
现在,这不会为您想要计算每个测量的额外值的最后一位做。您可以通过向merged
添加一列来完成此操作,如下所示:
merged['newcol'] = merged.x * merged.a * merged.measurement