根据第二个列表的值在一个列表上执行计算

时间:2014-03-01 14:10:57

标签: python numpy

我有几个列表,例如:

A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,1,0,0,1,0,1,1,1,0]

每个浮点值按顺序对应一个整数。浮点代表一个类别/标签,因此我不需要对这些值进行计算。

我需要找到与每个类别对应的整数的平均值。例如:0.02 = 0.33,因为0 + 0 + 1/3 = 0.33且0.03 = 0.5,因为0 + 1/2 = 0.5。类别的平均值永远不会为0.

然后,我需要将列表中的整数值替换为这些平均值,所以:

A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,1,0,0,1,0,1,1,1,0]

变为

A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,0.33,0.33,0.33,0.5,0.5,0.75,0.75,0.75,0.75]

我尝试将列表拆分为类别和整数,将两者拼凑在一起,迭代它们以收集每个类别的所有整数值,然后计算平均值。不幸的是,它很快就过去了,我无法解决我的多个嵌套for循环和if语句。

如果有人能指出我正确的方向,我会非常感激!

3 个答案:

答案 0 :(得分:2)

您可以使用boolean masks

在np.array上使用花式索引
In [248]: a = np.array(A[:len(A)//2])

In [249]: b = np.array(A[len(A)//2:], dtype=float)

In [250]: for i in set(a):
     ...:     t=b[a==i]
     ...:     b[a==i]=sum(t)*1.0/len(t)
     ...: print b
[ 0.33333333  0.33333333  0.33333333  0.5         0.5         0.75        0.75
  0.75        0.75      ]

答案 1 :(得分:2)

如果您的数据是这样呈现的,那么一种纯粹的Python方式是:

from itertools import groupby, izip, chain

def float_int_avg(sequence):
    def _do_grouping(sequence):
        for k, g in groupby(izip(*izip(*[iter(A)] * (len(A) // 2))), lambda L: L[0]):
            vals = [el[1] for el in g]
            avg = sum(vals, 0.0) / len(vals)
            for i in xrange(len(vals)):
                yield k, avg
    return list(chain.from_iterable(izip(*_do_grouping(sequence))))

A = [0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.04,0.04,1,0,0,1,0,1,1,1,0]
result = float_int_avg(A)
# [0.02, 0.02, 0.02, 0.03, 0.03, 0.04, 0.04, 0.04, 0.04, 0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0.5, 0.5, 0.75, 0.75, 0.75, 0.75]

更好的方法:

from itertools import groupby, izip, chain, repeat
from operator import itemgetter

def float_int_avg(sequence):
    floats, ints = A[:len(A) // 2], A[len(A) // 2:]
    def _group(sequence):
        for k, g in groupby(izip(floats, ints), itemgetter(0)):
            vals = [el[1] for el in g]
            yield repeat(sum(vals, 0.0)/len(vals), len(vals))
    return floats + list(chain.from_iterable(_group(sequence)))

答案 2 :(得分:2)

让我们将该列表放入NumPy数组中:

>>> import numpy as np
>>> a = np.asarray(A)
>>> a
array([ 0.02,  0.02,  0.02,  0.03,  0.03,  0.04,  0.04,  0.04,  0.04,
        1.  ,  0.  ,  0.  ,  1.  ,  0.  ,  1.  ,  1.  ,  1.  ,  0.  ])

“每个浮点值按顺序对应一个整数。”我们可以使用numpy.split分割它们:

>>> labels, values = np.split(a, 2)

“我需要找到与每个类别对应的整数的平均值。”这是scipy.ndimage.measurements.mean的工作:

>>> import scipy.ndimage
>>> avgs = scipy.ndimage.measurements.mean(values, labels, labels)
>>> avgs
array([ 0.33333333,  0.33333333,  0.33333333,  0.5       ,  0.5       ,
        0.75      ,  0.75      ,  0.75      ,  0.75      ])

“然后,我需要将列表中的整数值替换为这些平均值”。使用numpy.hstack组装新数组最简单:

>>> np.hstack((labels, avgs))
array([ 0.02      ,  0.02      ,  0.02      ,  0.03      ,  0.03      ,
        0.04      ,  0.04      ,  0.04      ,  0.04      ,  0.33333333,
        0.33333333,  0.33333333,  0.5       ,  0.5       ,  0.75      ,
        0.75      ,  0.75      ,  0.75      ])

把所有这些放在一起:

labels, values = np.split(np.asarray(A), 2)
avgs = scipy.ndimage.measurements.mean(values, labels, labels)
A = np.hstack((labels, avgs))