我正在寻找一种迭代Python数组的简单方法,并在两个第一列相同时平均第三列。
例如,这个数组:
['0.30', '1.9', 5]
['0.30', '1.9', 33]
['0.30', '1.9', 39]
['0.30', '2.0', 21]
['0.30', '2.0', 51]
['0.30', '2.0', 51]
['0.30', '2.1', 42]
['0.30', '2.1', 34]
['0.30', '2.1', 43]
['0.30', '2.2', 38]
['0.30', '2.2', 34]
['0.30', '2.2', 50]
['0.34', '1.9', 29]
['0.34', '1.9', 47]
['0.34', '2.0', 45]
['0.34', '2.0', 31]
['0.34', '2.0', 45]
['0.34', '2.0', 57]
['0.34', '2.0', 25]
应该成为:
['0.30', '1.9', 25.66]
['0.30', '2.0', 41.00
['0.30', '2.1', 39.66]
['0.30', '2.2', 40.66]
['0.30', '2.3', 26.00]
['0.34', '1.9', 38.00]
['0.34', '2.0', 40.60]
如何在Python中执行此操作?
答案 0 :(得分:2)
from itertools import groupby
final = []
for k,v in groupby(l,lambda x:x[:2]):
lst = list(v)
avg = sum(x[2] for x in lst) / float(len(lst))
lst[0][2] = round(avg,2)
final.append(lst[0])
print final
[['0.30', '1.9', 25.67], ['0.30', '2.0', 41.0], ['0.30', '2.1', 39.67], ['0.30', '2.2', 40.67], ['0.34', '1.9', 38.0], ['0.34', '2.0', 40.6]]
如果您使用的是python 3.4,则可以使用统计信息库来计算平均值:
from statistics import mean
avg = mean(x[2] for x in lst)
答案 1 :(得分:2)
一个明显的解决方案是使用散列。我会创建一个dictionary
,前两列中有tuple
作为键,值是与该对相对应的数字列表。
以下是一些代码来说明:
data = {}
for item in array:
data.setdefault((item[0], item[1]), []).append(item[2])
for k, v in d.items():
print k, sum(v)/len(v)
结果:
('0.30', '2.0') 41
('0.30', '1.9') 25
('0.30', '2.2') 40
('0.34', '2.0') 40
('0.30', '2.1') 39
('0.34', '1.9') 38
请注意,由于我们对数据进行了哈希处理,因此结果不一致。
答案 2 :(得分:1)
看看你看到你使用R的其他一些问题。你可能对{3}}感兴趣,这是一个Python数据分析库,与R有一些相似之处。
import pandas as pd
df = pd.DataFrame([['0.30', '1.9', 5],['0.30', '1.9', 33],['0.30', '1.9', 39]
,['0.30', '2.0', 21],['0.30', '2.0', 51],['0.30', '2.0', 51]
,['0.30', '2.1', 42],['0.30', '2.1', 34],['0.30', '2.1', 43]
,['0.30', '2.2', 38],['0.30', '2.2', 34],['0.30', '2.2', 50]
,['0.34', '1.9', 29],['0.34', '1.9', 47],['0.34', '2.0', 45]
,['0.34', '2.0', 31],['0.34', '2.0', 45],['0.34', '2.0', 57]
,['0.34', '2.0', 25]])
df.groupby([0,1]).agg(lambda x: x.mean()).reset_index()
产地:
0 1 2
0 0.30 1.9 25.666667
1 0.30 2.0 41.000000
2 0.30 2.1 39.666667
3 0.30 2.2 40.666667
4 0.34 1.9 38.000000
5 0.34 2.0 40.600000
答案 3 :(得分:1)
作为替代答案,您可以在列表理解中使用zip
:
>>> l
[['0.30', '1.9', 5], ['0.30', '1.9', 33], ['0.30', '1.9', 39], ['0.30', '2.0', 21], ['0.30', '2.0', 51], ['0.30', '2.0', 51], ['0.30', '2.1', 42], ['0.30', '2.1', 34], ['0.30', '2.1', 43], ['0.30', '2.2', 38], ['0.30', '2.2', 34], ['0.30', '2.2', 50], ['0.34', '1.9', 29], ['0.34', '1.9', 47], ['0.34', '2.0', 45], ['0.34', '2.0', 31], ['0.34', '2.0', 45], ['0.34', '2.0', 57], ['0.34', '2.0', 25]]
>>> gl=[list(g) for k,g in groupby(l,lambda x : x[:2])]
>>> [[zip(*i)[0][0],zip(*i)[1][0],sum(zip(*i)[2])/len(i)] for i in gl]
[['0.30', '1.9', 25], ['0.30', '2.0', 41], ['0.30', '2.1', 39], ['0.30', '2.2', 40], ['0.34', '1.9', 38], ['0.34', '2.0', 40]]