如果另一列中的值没有变化,如何平均csv的一列中的值?

时间:2012-02-22 16:40:24

标签: python csv

编辑:请参阅我发布的工作代码的结尾,从zeekay here获取。

我有一个包含两列(电压和电流)的CSV文件。因为电压被记录到许多有效数字并且电流仅具有2,所以当电压值改变时存在许多相同的电流值。这对编程并不重要,但我只是解释了如何实际获取数据。我想执行以下操作:

只要第二列(当前)的值没有改变,就将第一列(电压)的值收集到列表中并对它们求平均值。然后将一行写入新的CSV文件,该文件是第一列中电压的平均值和第二列中未改变的恒定电流值。换句话说,如果有20行电流没有改变(比如6uA),那么20个相应的电压值被平均(比如说这个平均值是600 mV)并且在新的行中生成一行读取的csv文件('0.6','0.000006')。然后我想继续迭代正在读取的csv,对每组固定电流重复上述过程。

到目前为止我已经得到了以下代码,但我不确定我是否走在正确的轨道上:

import sys, csv
with open('filetowriteto.csv','w') as avg:
    loadeddata = open('filetoreadfrom.csv','r')
    writer=csv.writer(avg)
    readloaded=csv.reader(loadeddata)
    listloaded=list(readloaded)
    oldcurrent=listloaded[0][1]
    for row in readloaded:
        newcurrent = row[1]
        biaslist = []
        if newcurrent == oldcurrent:
            biaslist.append(row[0])
        else :
            biasavg = float(sum(biaslist))/len(biaslist)
            writer.writerow([biasavg,newcurrent])
            newcurrent = row[1]

然后我不知道该去哪里。

编辑:似乎zeekay正朝着正确的方向前进。我正在尝试实现他的itertools.groupby()方法,但我目前正在生成一个空白文件。这是我到目前为止的新代码:

import sys, csv, itertools
with open('VI_avg(12).csv','w') as avg: # this is the file which gets written
    loadeddata = open('VI(12).csv','r') # this is the file which is read
    writer=csv.writer(avg)
    readloaded=csv.reader(loadeddata)
    listloaded=list(readloaded)
    oldcurrent=listloaded[0][1] # looks like this is no longer required
    for current, row in itertools.groupby(readloaded, lambda x: x[1]):
        biaslist = [float(x[0]) for x in row]
        biasavg = float(sum(biaslist))/len(biaslist)
        # write it out
        writer.writerow(biasavg, current)

假设正在打开的CSV文件是这样的(缩短示例):

0.595417,0.000065
0.595177,0.000065
0.594937,0.000065
0.594697,0.000065
0.594457,0.000065
0.594217,0.000065
0.593977,0.000065
0.593737,0.000065
0.593497,0.000064
0.593017,0.000064
0.592777,0.000064
0.592537,0.000064
0.592297,0.000064
0.587018,0.000064
0.586778,0.000064
0.586538,0.000063
0.586299,0.000063
0.586059,0.000063
0.585579,0.000063
0.585339,0.000063
0.585099,0.000063
0.584859,0.000063
0.584619,0.000063
0.584379,0.000063
0.584139,0.000063
0.583899,0.000063
0.583659,0.000063

最终更新:这是从zeekay获得的工作版本:

import csv
import itertools

with open('VI(12).csv') as input, open('VI_avg(12).csv','w') as output:
    reader = csv.reader(input)
    writer = csv.writer(output)
    for current, row in itertools.groupby(reader, lambda x: x[1]):
        biaslist = [float(x[0]) for x in row]
        biasavg = float(sum(biaslist))/len(biaslist)
        writer.writerow([biasavg, current])

5 个答案:

答案 0 :(得分:2)

您可以在阅读csv时使用itertools.groupby对结果进行分组,这会简化很多事情。鉴于您的更新示例:

import csv
import itertools

with open('VI(12).csv') as input, open('VI_avg(12).csv','w') as output:
    reader = csv.reader(input)
    writer = csv.writer(output)
    for current, row in itertools.groupby(reader, lambda x: x[1]):
        biaslist = [float(x[0]) for x in row]
        biasavg = float(sum(biaslist))/len(biaslist)
        writer.writerow([biasavg, current])

答案 1 :(得分:1)

也许您可以尝试使用pandas

import pandas
voltage = [1.1, 1.2, 1.3, 2.1, 2.2, 2.3]
current = [1.0, 1.0, 1.1, 1.3, 1.2, 1.3]
df = pandas.DataFrame({'voltage': voltage, 'current': current}) 
result = df.groupby('current').mean()

# Output
         voltage
current         
1.0      1.15   
1.1      1.30   
1.2      2.20   
1.3      2.20 

result.to_csv('grouped_data.csv')

答案 2 :(得分:1)

一种方式:

curDict = {}
for row in loaded row:
  if row[1] not in curDict.keys(): # if not already there create key/value pair
    curDict[str(row[1])] = [row[0]]
  else: # already exists, add to key/value pair
    curDict[str(row[1])].append(row[0])

#You'll end up with:
# {'0.6': [599, 600, 601...], ...}


# write the rows
for k,v in curDict.values():
  avgValue = reduce(lambda a,b: a+b, v)/len(v) # calculate the avg of the voltages
  writer.writerow([k,avgValue])

答案 3 :(得分:0)

此版本将按照您的描述进行操作,但无论它们是否连续,它都将使用相同的电压平均所有值。抱歉,如果这不是你想要的,但也许它可以帮助你一路走来:

import csv
from collections import defaultdict

def f(acc, row):
    acc[row[1]].append(float(row[0]))
    return acc

with open('out.csv', 'w') as out:
  writer = csv.writer(out)

  data = open('in.csv', 'r')
  r = csv.reader(data)

  reduced = reduce(f, r, defaultdict(list))
  for v, c in reduced.items():
      writer.writerow([v, sum(c)/len(c)])

答案 4 :(得分:0)

使用一些非常小的测试数据的另一种方式(没有包含csv的东西,因为你似乎有一个句柄):

#!/usr/bin/python3

test_data = [       # Only 3 currents in testdata:
    (0.00030,5),    #   5 : Only one entry, total 0.00030 - so should give 0.00030 as the average
    (0.00012,6),    #   6 : Two entries,    total 0.00048 - so should give 0.00024 as the average
    (0.00036,6),
    (0.00001,7),    #   7 : Four entries,   total 0.00008 - so should give 0.00002 as the average
    (0.00001,7),
    (0.00001,7),
    (0.00007,7)]

currents = dict()

for row in test_data:
    if not row[1] in currents:
        matching_currents = list((each[0] for each in test_data if each[1] == row[1]))
        current_average = sum(matching_currents) / len(matching_currents)
        currents[row[1]] = current_average

print("There were {0} unique currents found:\n".format(len(currents)))
for current,bias in currents.items():
    print("Current: {0:2d}   ( Average: {1:1.5f} )".format(current,bias))