编辑:请参阅我发布的工作代码的结尾,从zeekay here获取。
我有一个包含两列(电压和电流)的CSV文件。因为电压被记录到许多有效数字并且电流仅具有2,所以当电压值改变时存在许多相同的电流值。这对编程并不重要,但我只是解释了如何实际获取数据。我想执行以下操作:
只要第二列(当前)的值没有改变,就将第一列(电压)的值收集到列表中并对它们求平均值。然后将一行写入新的CSV文件,该文件是第一列中电压的平均值和第二列中未改变的恒定电流值。换句话说,如果有20行电流没有改变(比如6uA),那么20个相应的电压值被平均(比如说这个平均值是600 mV)并且在新的行中生成一行读取的csv文件('0.6','0.000006')。然后我想继续迭代正在读取的csv,对每组固定电流重复上述过程。
到目前为止我已经得到了以下代码,但我不确定我是否走在正确的轨道上:
import sys, csv
with open('filetowriteto.csv','w') as avg:
loadeddata = open('filetoreadfrom.csv','r')
writer=csv.writer(avg)
readloaded=csv.reader(loadeddata)
listloaded=list(readloaded)
oldcurrent=listloaded[0][1]
for row in readloaded:
newcurrent = row[1]
biaslist = []
if newcurrent == oldcurrent:
biaslist.append(row[0])
else :
biasavg = float(sum(biaslist))/len(biaslist)
writer.writerow([biasavg,newcurrent])
newcurrent = row[1]
然后我不知道该去哪里。
编辑:似乎zeekay正朝着正确的方向前进。我正在尝试实现他的itertools.groupby()方法,但我目前正在生成一个空白文件。这是我到目前为止的新代码:
import sys, csv, itertools
with open('VI_avg(12).csv','w') as avg: # this is the file which gets written
loadeddata = open('VI(12).csv','r') # this is the file which is read
writer=csv.writer(avg)
readloaded=csv.reader(loadeddata)
listloaded=list(readloaded)
oldcurrent=listloaded[0][1] # looks like this is no longer required
for current, row in itertools.groupby(readloaded, lambda x: x[1]):
biaslist = [float(x[0]) for x in row]
biasavg = float(sum(biaslist))/len(biaslist)
# write it out
writer.writerow(biasavg, current)
假设正在打开的CSV文件是这样的(缩短示例):
0.595417,0.000065
0.595177,0.000065
0.594937,0.000065
0.594697,0.000065
0.594457,0.000065
0.594217,0.000065
0.593977,0.000065
0.593737,0.000065
0.593497,0.000064
0.593017,0.000064
0.592777,0.000064
0.592537,0.000064
0.592297,0.000064
0.587018,0.000064
0.586778,0.000064
0.586538,0.000063
0.586299,0.000063
0.586059,0.000063
0.585579,0.000063
0.585339,0.000063
0.585099,0.000063
0.584859,0.000063
0.584619,0.000063
0.584379,0.000063
0.584139,0.000063
0.583899,0.000063
0.583659,0.000063
最终更新:这是从zeekay获得的工作版本:
import csv
import itertools
with open('VI(12).csv') as input, open('VI_avg(12).csv','w') as output:
reader = csv.reader(input)
writer = csv.writer(output)
for current, row in itertools.groupby(reader, lambda x: x[1]):
biaslist = [float(x[0]) for x in row]
biasavg = float(sum(biaslist))/len(biaslist)
writer.writerow([biasavg, current])
答案 0 :(得分:2)
您可以在阅读csv时使用itertools.groupby
对结果进行分组,这会简化很多事情。鉴于您的更新示例:
import csv
import itertools
with open('VI(12).csv') as input, open('VI_avg(12).csv','w') as output:
reader = csv.reader(input)
writer = csv.writer(output)
for current, row in itertools.groupby(reader, lambda x: x[1]):
biaslist = [float(x[0]) for x in row]
biasavg = float(sum(biaslist))/len(biaslist)
writer.writerow([biasavg, current])
答案 1 :(得分:1)
也许您可以尝试使用pandas:
import pandas
voltage = [1.1, 1.2, 1.3, 2.1, 2.2, 2.3]
current = [1.0, 1.0, 1.1, 1.3, 1.2, 1.3]
df = pandas.DataFrame({'voltage': voltage, 'current': current})
result = df.groupby('current').mean()
# Output
voltage
current
1.0 1.15
1.1 1.30
1.2 2.20
1.3 2.20
result.to_csv('grouped_data.csv')
答案 2 :(得分:1)
一种方式:
curDict = {}
for row in loaded row:
if row[1] not in curDict.keys(): # if not already there create key/value pair
curDict[str(row[1])] = [row[0]]
else: # already exists, add to key/value pair
curDict[str(row[1])].append(row[0])
#You'll end up with:
# {'0.6': [599, 600, 601...], ...}
# write the rows
for k,v in curDict.values():
avgValue = reduce(lambda a,b: a+b, v)/len(v) # calculate the avg of the voltages
writer.writerow([k,avgValue])
答案 3 :(得分:0)
此版本将按照您的描述进行操作,但无论它们是否连续,它都将使用相同的电压平均所有值。抱歉,如果这不是你想要的,但也许它可以帮助你一路走来:
import csv
from collections import defaultdict
def f(acc, row):
acc[row[1]].append(float(row[0]))
return acc
with open('out.csv', 'w') as out:
writer = csv.writer(out)
data = open('in.csv', 'r')
r = csv.reader(data)
reduced = reduce(f, r, defaultdict(list))
for v, c in reduced.items():
writer.writerow([v, sum(c)/len(c)])
答案 4 :(得分:0)
使用一些非常小的测试数据的另一种方式(没有包含csv的东西,因为你似乎有一个句柄):
#!/usr/bin/python3
test_data = [ # Only 3 currents in testdata:
(0.00030,5), # 5 : Only one entry, total 0.00030 - so should give 0.00030 as the average
(0.00012,6), # 6 : Two entries, total 0.00048 - so should give 0.00024 as the average
(0.00036,6),
(0.00001,7), # 7 : Four entries, total 0.00008 - so should give 0.00002 as the average
(0.00001,7),
(0.00001,7),
(0.00007,7)]
currents = dict()
for row in test_data:
if not row[1] in currents:
matching_currents = list((each[0] for each in test_data if each[1] == row[1]))
current_average = sum(matching_currents) / len(matching_currents)
currents[row[1]] = current_average
print("There were {0} unique currents found:\n".format(len(currents)))
for current,bias in currents.items():
print("Current: {0:2d} ( Average: {1:1.5f} )".format(current,bias))