当两个其他列值匹配时计算列的平均值

时间:2014-11-06 19:52:04

标签: python arrays

我正在寻找一种迭代Python数组的简单方法,并在两个第一列相同时平均第三列。

例如,这个数组:

['0.30', '1.9', 5]
['0.30', '1.9', 33]
['0.30', '1.9', 39]
['0.30', '2.0', 21]
['0.30', '2.0', 51]
['0.30', '2.0', 51]
['0.30', '2.1', 42]
['0.30', '2.1', 34]
['0.30', '2.1', 43]
['0.30', '2.2', 38]
['0.30', '2.2', 34]
['0.30', '2.2', 50]
['0.34', '1.9', 29]
['0.34', '1.9', 47]
['0.34', '2.0', 45]
['0.34', '2.0', 31]
['0.34', '2.0', 45]
['0.34', '2.0', 57]
['0.34', '2.0', 25]

应该成为:

['0.30', '1.9', 25.66]
['0.30', '2.0', 41.00
['0.30', '2.1', 39.66]
['0.30', '2.2', 40.66]
['0.30', '2.3', 26.00]
['0.34', '1.9', 38.00]
['0.34', '2.0', 40.60]

如何在Python中执行此操作?

4 个答案:

答案 0 :(得分:2)

from itertools import groupby

final = []
for k,v in groupby(l,lambda x:x[:2]):
    lst = list(v)
    avg = sum(x[2] for x in lst) / float(len(lst))
    lst[0][2] = round(avg,2)
    final.append(lst[0])
print final
[['0.30', '1.9', 25.67], ['0.30', '2.0', 41.0], ['0.30', '2.1', 39.67], ['0.30', '2.2', 40.67], ['0.34', '1.9', 38.0], ['0.34', '2.0', 40.6]]

如果您使用的是python 3.4,则可以使用统计信息库来计算平均值:

from statistics import mean

avg = mean(x[2] for x in lst)

答案 1 :(得分:2)

一个明显的解决方案是使用散列。我会创建一个dictionary,前两列中有tuple作为键,值是与该对相对应的数字列表。

以下是一些代码来说明:

data = {}
for item in array:
    data.setdefault((item[0], item[1]), []).append(item[2])

for k, v in d.items():
    print k, sum(v)/len(v)

结果:

('0.30', '2.0') 41
('0.30', '1.9') 25
('0.30', '2.2') 40
('0.34', '2.0') 40
('0.30', '2.1') 39
('0.34', '1.9') 38

请注意,由于我们对数据进行了哈希处理,因此结果不一致。

答案 2 :(得分:1)

看看你看到你使用R的其他一些问题。你可能对{3}}感兴趣,这是一个Python数据分析库,与R有一些相似之处。

import pandas as pd

df = pd.DataFrame([['0.30', '1.9',  5],['0.30', '1.9', 33],['0.30', '1.9', 39]
                  ,['0.30', '2.0', 21],['0.30', '2.0', 51],['0.30', '2.0', 51]
                  ,['0.30', '2.1', 42],['0.30', '2.1', 34],['0.30', '2.1', 43]
                  ,['0.30', '2.2', 38],['0.30', '2.2', 34],['0.30', '2.2', 50]
                  ,['0.34', '1.9', 29],['0.34', '1.9', 47],['0.34', '2.0', 45]
                  ,['0.34', '2.0', 31],['0.34', '2.0', 45],['0.34', '2.0', 57]
                  ,['0.34', '2.0', 25]])

df.groupby([0,1]).agg(lambda x: x.mean()).reset_index()

产地:

      0    1          2
0  0.30  1.9  25.666667
1  0.30  2.0  41.000000
2  0.30  2.1  39.666667
3  0.30  2.2  40.666667
4  0.34  1.9  38.000000
5  0.34  2.0  40.600000

答案 3 :(得分:1)

作为替代答案,您可以在列表理解中使用zip

>>> l
[['0.30', '1.9', 5], ['0.30', '1.9', 33], ['0.30', '1.9', 39], ['0.30', '2.0', 21], ['0.30', '2.0', 51], ['0.30', '2.0', 51], ['0.30', '2.1', 42], ['0.30', '2.1', 34], ['0.30', '2.1', 43], ['0.30', '2.2', 38], ['0.30', '2.2', 34], ['0.30', '2.2', 50], ['0.34', '1.9', 29], ['0.34', '1.9', 47], ['0.34', '2.0', 45], ['0.34', '2.0', 31], ['0.34', '2.0', 45], ['0.34', '2.0', 57], ['0.34', '2.0', 25]]

>>> gl=[list(g) for k,g in groupby(l,lambda x : x[:2])]
>>> [[zip(*i)[0][0],zip(*i)[1][0],sum(zip(*i)[2])/len(i)] for i in gl]
[['0.30', '1.9', 25], ['0.30', '2.0', 41], ['0.30', '2.1', 39], ['0.30', '2.2', 40], ['0.34', '1.9', 38], ['0.34', '2.0', 40]]