我有一个日常值列表,这些值被排列在如下的词典列表中:
vals = [
{'date': '1-1-2014', 'a': 10, 'b': 33.5, 'c': 82, 'notes': 'high repeat rate'},
{'date': '2-1-2014', 'a': 5, 'b': 11.43, 'c': 182, 'notes': 'normal operations'},
{'date': '3-1-2014', 'a': 0, 'b': 0.5, 'c': 2, 'notes': 'high failure rate'},
...]
我想做的是获得a,b&的平均值。 c表示月份。
有没有比做以下事情更好的方法:
val_points = {}
val_len = len(vals)
for day in vals:
for p in ['a', 'b', 'c']:
if val_points.has_key(p):
val_points += day[p]
else:
val_points = day[p]
val_avg = dict([(i, val_points[i] / val_len] for p in val_points])
我没有运行上面的代码,可能有毛刺,但我希望我能得到这个想法。我知道使用运算符,itertools和集合的某种组合可能有更好的方法。
答案 0 :(得分:3)
{p:sum(map(lambda x:x[p],vals))/len(vals) for p in ['a','b','c']}
<强>输出:强>
{'a': 5, 'c': 88, 'b': 15.143333333333333}
答案 1 :(得分:1)
这可能比Elisha的答案略长,但中间数据结构较少,因此可能更快:
KEYS = ['a', 'b', 'c']
def sum_and_count(sums_and_counts, item, key):
prev_sum, prev_count = sums_and_counts.get(key, (0,0)) # using get to have a fall-back if there is nothing in our sums_and_counts
return (prev_sum+item.get(key, 0), prev_count+1) # using get to have a 0 default for a non-existing key in item
sums_and_counts = reduce(lambda sc, item: {key: sum_and_count(sc, item, key) for key in KEYS}, vals, {})
averages = {k:float(total)/no for (k,(total,no)) in sums_and_counts.iteritems()}
print averages
<强>输出强>:
{'a': 5.0, 'c': 88.66666666666667, 'b': 15.143333333333333}
答案 2 :(得分:1)
由于您希望按月计算平均值(此处考虑了&#39; dd-mm-yyyy&#39;中的日期格式):
vals = [
{'date': '1-1-2014', 'a': 10, 'b': 33.5, 'c': 82, 'notes': 'high repeat rate'},
{'date': '2-1-2014', 'a': 5, 'b': 11.43, 'c': 182, 'notes': 'normal operations'},
{'date': '3-1-2014', 'a': 20, 'b': 0.5, 'c': 2, 'notes': 'high failure rate'},
{'date': '3-2-2014', 'a': 0, 'b': 0.5, 'c': 2, 'notes': 'high failure rate'},
{'date': '4-2-2014', 'a': 20, 'b': 0.5, 'c': 2, 'notes': 'high failure rate'}
]
month = {}
for x in vals:
newKey = x['date'].split('-')[1]
if newKey not in month:
month[newKey] = {}
for k in 'abc':
if k in month[newKey]:
month[newKey][k].append(x[k])
else:
month[newKey][k] = [x[k]]
output = {}
for y in month:
if y not in output:
output[y] = {}
for z in month[y]:
output[y][z] = sum(month[y][z])/float(len(month[y][z]))
print output
<强>输出:强>
{'1': {'a': 11.666666666666666, 'c': 88.66666666666667, 'b': 15.143333333333333},
'2': {'a': 10.0, 'c': 2.0, 'b': 0.5}}
答案 3 :(得分:0)
如果你有多个月的数据,熊猫会让你的生活更轻松:
df = pandas.DataFrame(vals)
df.date = [pandas.datetools.parse(d, dayfirst=True) for d in df.date]
df.set_index('date', inplace=True)
means = df.resample('m', how='mean')
结果:
a b c
date
2014-01-31 5 15.143333 88.666667