我有一个在.csv文件中执行100多列的数学运算。现在我只能在一个列上应用此操作。我怎样才能做到这一点?
import csv
import numpy as np
with open ('file.csv','rU') as f:
reader=csv.reader(f)
arr=[]
for col in reader:
arr.append(float(col[0]))
with open('/file.csv','w') as f:
fn=['col0']
writer=csv.DictWriter(f,fieldnames=fn)
chunks=[arr[x:x+66] for x in xrange(0, len(arr), 66)]
group = []
for i in range(len(chunks)):
grp=chunks[i]
grp=grp[6:]
group.append(grp)
flat_group = []
for x in range(len(group)):
for y in range(len(group[x])):
flat_group.append(group[x][y])
avg = []
unflattened_grp = zip(*[iter(flat_group)]*6)
for z in range(len(unflattened_grp)):
avrg = sum(unflattened_grp[z])/len(unflattened_grp[z])
avg.append(avrg)
for row in avg:
writer.writerow({'col0':row})
答案 0 :(得分:0)
您可以将整个输入文件读入内存,转置行和列,然后将数学运算应用于转置数组中的每一行数据,或将其写入另一个.csv文件以进一步处理行-a-时间。
import csv
with open('numbers.csv', 'rb') as f:
reader = csv.reader(f)
arr = [tuple(float(v) for v in row) for row in reader]
arr = zip(*arr) # transpose rows and columns
with open('transposed_numbers.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(arr)
print('done')
示例输入文件:
0,1,2,3,4
5,6,7,8,9
10,11,12,13,14
示例输出文件:
0.0,5.0,10.0
1.0,6.0,11.0
2.0,7.0,12.0
3.0,8.0,13.0
4.0,9.0,14.0
另一个更慢的选项是多次读取输入文件,每次迭代处理不同的列。这是典型速度与内存使用权衡的一个例子。