如何获得特定列表元素的平均值

时间:2012-07-24 23:02:17

标签: python numpy

我有以下部分输入文件(超过500行):

L1, a, b, 10, 20, pass,
L1, c, d, 11, 21, pass,
L1, e, f, 12, 22, pass,
L1, a, b, 13, 23, pass,
L1, e, f, 14, 34, pass,

我想获得重复的平均值,即输出如下:

(其中L1,a,b,11.5 =(10 + 13)/ 2,21.5 =(20 + 23)/ 2)

L1, a, b, 11.5, 21.5
L1, c, d, 11, 21
L1, e, f, 13, 28

我目前的初学者python代码如下 - 仍然努力更好地调整它

 import csv
 from collections import defaultdict
 import numpy as np

 dd = defaultdict(list)
 with open("mean.csv") as input_file:
 for row in csv.reader(input_file):
            dd[tuple(row[:3])].append(float(row[3]))
            dd[tuple(row[:3])].append(float(row[4]))

 for k, v, m in dd.iteritems():
      if len(v) > 1:
           print (' '.join(k), np.mean(v), np.mean(m))

我得到的错误是:

   Traceback (most recent call last):
   File "average.py", line 11, in <module>
      for k, v, m in dd.iteritems():
   ValueError: need more than 2 values to unpack

2 个答案:

答案 0 :(得分:6)

未经测试,但是像这样的基础可以适用于其他专栏......因为这只是目前的一个。

import csv
from collections import defaultdict
import numpy as np

dd = defaultdict(list)
with open('in.csv') as fin:
    for row in csv.reader(fin):
        dd[tuple(row[:3])].append(float(row[3]))

for k, v in dd.iteritems():
    if len(v) > 1:
        print ' '.join(k), np.mean(v)

答案 1 :(得分:1)

使用pandas这将非常短(而且应该很快)。

您可以执行以下操作(不知道列的含义或命名,因此它取决于您要用作DataFrame的索引):

In [1]: df = pd.read_csv('mean.csv', delimiter=',', header=None)

In [2]: df
Out[2]: 
  X.1 X.2 X.3  X.4  X.5
0  L1   a   b   10   20
1  L1   c   d   11   21
2  L1   e   f   12   22
3  L1   a   b   13   23
4  L1   e   f   14   34

In [3]: df.groupby(['X.1', 'X.2', 'X.3']).mean()
Out[3]: 
              X.4   X.5
X.1 X.2 X.3            
L1   a   b   11.5  21.5
     c   d   11.0  21.0
     e   f   13.0  28.0