我有几个列表,例如:
A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,1,0,0,1,0,1,1,1,0]
每个浮点值按顺序对应一个整数。浮点代表一个类别/标签,因此我不需要对这些值进行计算。
我需要找到与每个类别对应的整数的平均值。例如:0.02 = 0.33,因为0 + 0 + 1/3 = 0.33且0.03 = 0.5,因为0 + 1/2 = 0.5。类别的平均值永远不会为0.
然后,我需要将列表中的整数值替换为这些平均值,所以:
A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,1,0,0,1,0,1,1,1,0]
变为
A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,0.33,0.33,0.33,0.5,0.5,0.75,0.75,0.75,0.75]
我尝试将列表拆分为类别和整数,将两者拼凑在一起,迭代它们以收集每个类别的所有整数值,然后计算平均值。不幸的是,它很快就过去了,我无法解决我的多个嵌套for循环和if语句。
如果有人能指出我正确的方向,我会非常感激!
答案 0 :(得分:2)
您可以使用boolean masks
在np.array上使用花式索引In [248]: a = np.array(A[:len(A)//2])
In [249]: b = np.array(A[len(A)//2:], dtype=float)
In [250]: for i in set(a):
...: t=b[a==i]
...: b[a==i]=sum(t)*1.0/len(t)
...: print b
[ 0.33333333 0.33333333 0.33333333 0.5 0.5 0.75 0.75
0.75 0.75 ]
答案 1 :(得分:2)
如果您的数据是这样呈现的,那么一种纯粹的Python方式是:
from itertools import groupby, izip, chain
def float_int_avg(sequence):
def _do_grouping(sequence):
for k, g in groupby(izip(*izip(*[iter(A)] * (len(A) // 2))), lambda L: L[0]):
vals = [el[1] for el in g]
avg = sum(vals, 0.0) / len(vals)
for i in xrange(len(vals)):
yield k, avg
return list(chain.from_iterable(izip(*_do_grouping(sequence))))
A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,1,0,0,1,0,1,1,1,0]
result = float_int_avg(A)
# [0.02, 0.02, 0.02, 0.03, 0.03, 0.04, 0.04, 0.04, 0.04, 0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0.5, 0.5, 0.75, 0.75, 0.75, 0.75]
更好的方法:
from itertools import groupby, izip, chain, repeat
from operator import itemgetter
def float_int_avg(sequence):
floats, ints = A[:len(A) // 2], A[len(A) // 2:]
def _group(sequence):
for k, g in groupby(izip(floats, ints), itemgetter(0)):
vals = [el[1] for el in g]
yield repeat(sum(vals, 0.0)/len(vals), len(vals))
return floats + list(chain.from_iterable(_group(sequence)))
答案 2 :(得分:2)
让我们将该列表放入NumPy数组中:
>>> import numpy as np
>>> a = np.asarray(A)
>>> a
array([ 0.02, 0.02, 0.02, 0.03, 0.03, 0.04, 0.04, 0.04, 0.04,
1. , 0. , 0. , 1. , 0. , 1. , 1. , 1. , 0. ])
“每个浮点值按顺序对应一个整数。”我们可以使用numpy.split
分割它们:
>>> labels, values = np.split(a, 2)
“我需要找到与每个类别对应的整数的平均值。”这是scipy.ndimage.measurements.mean
的工作:
>>> import scipy.ndimage
>>> avgs = scipy.ndimage.measurements.mean(values, labels, labels)
>>> avgs
array([ 0.33333333, 0.33333333, 0.33333333, 0.5 , 0.5 ,
0.75 , 0.75 , 0.75 , 0.75 ])
“然后,我需要将列表中的整数值替换为这些平均值”。使用numpy.hstack
组装新数组最简单:
>>> np.hstack((labels, avgs))
array([ 0.02 , 0.02 , 0.02 , 0.03 , 0.03 ,
0.04 , 0.04 , 0.04 , 0.04 , 0.33333333,
0.33333333, 0.33333333, 0.5 , 0.5 , 0.75 ,
0.75 , 0.75 , 0.75 ])
把所有这些放在一起:
labels, values = np.split(np.asarray(A), 2)
avgs = scipy.ndimage.measurements.mean(values, labels, labels)
A = np.hstack((labels, avgs))