通过平均或重新调整numpy 2d数组来调整大小

时间:2011-11-11 05:58:38

标签: python numpy slice binning

我正在尝试在python中重新实现IDL函数:

http://star.pst.qub.ac.uk/idl/REBIN.html

通过求平均值减去2d阵列的整数因子。

例如:

>>> a=np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

我想通过取相关样本的平均值将其调整为(2,3),预期输出为:

>>> b = rebin(a, (2, 3))
>>> b
array([[  3.5,   5.5,  7.5],
       [ 15.5, 17.5,  19.5]])

即。 b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4])等等。

我相信我应该重塑为4维数组,然后在正确的切片上取平均值,但无法弄清楚算法。你有什么提示吗?

4 个答案:

答案 0 :(得分:34)

以下是基于the answer you've linked的示例(为清晰起见):

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

作为一项功能:

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

答案 1 :(得分:10)

J.F。塞巴斯蒂安对二维装箱有很好的答案。这是他的" rebin"的一个版本。适用于N维的函数:

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

答案 2 :(得分:2)

这是一种使用矩阵乘法进行求解的方法,它不需要新的数组维度来划分旧的。

首先我们生成一个行压缩器矩阵和一个列压缩器矩阵(我确信有一种更简洁的方法,甚至可以单独使用numpy操作):

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

...例如,get_row_compressor(5, 3)为您提供:

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

get_column_compressor(3, 2)为您提供:

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

然后简单地通过行压缩器预乘,然后通过列压缩器进行后乘,得到压缩矩阵:

def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

使用这种技术,

compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

的产率:

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

答案 3 :(得分:1)

我试图缩小栅格尺寸 - 使用大约6000 x 2000尺寸的栅格并将其转换为任意大小的较小栅格,以便在之前的垃圾箱尺寸中正确平均值。我找到了一个使用SciPy的解决方案,但后来我无法在我使用的共享主机服务上安装SciPy,所以我只是编写了这个函数。可能有更好的方法来执行此操作,不涉及循环遍历行和列,但这似乎确实有效。

关于这一点的好处是,旧的行数和列数不必被新的行数和列数整除。

def resize_array(a, new_rows, new_cols): 
    '''
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a.
    '''
    rows = len(a)
    cols = len(a[0])
    yscale = float(rows) / new_rows 
    xscale = float(cols) / new_cols

    # first average across the cols to shorten rows    
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols):
        # get the indices of the original array we are going to average across
        the_x_range = (j*xscale, (j+1)*xscale)
        firstx = int(the_x_range[0])
        lastx = int(the_x_range[1])
        # figure out the portion of the first and last index that overlap
        # with the new index, and thus the portion of those cells that 
        # we need to include in our average
        x0_scale = 1 - (the_x_range[0]-int(the_x_range[0]))
        xEnd_scale =  (the_x_range[1]-int(the_x_range[1]))
        # scale_line is a 1d array that corresponds to the portion of each old
        # index in the_x_range that should be included in the new average
        scale_line = np.ones((lastx-firstx+1))
        scale_line[0] = x0_scale
        scale_line[-1] = xEnd_scale
        # Make sure you don't screw up and include an index that is too large
        # for the array. This isn't great, as there could be some floating
        # point errors that mess up this comparison.
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lastx = lastx - 1
        # Now it's linear algebra time. Take the dot product of a slice of
        # the original array and the scale_line
        new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum()

    # Then average across the rows to shorten the cols. Same method as above.
    # It is probably possible to simplify this code, as this is more or less
    # the same procedure as the block of code above, but transposed.
    # Here I'm reusing the variable a. Sorry if that's confusing.
    a = np.zeros((new_rows, new_cols))
    for i in range(new_rows):
        the_y_range = (i*yscale, (i+1)*yscale)
        firsty = int(the_y_range[0])
        lasty = int(the_y_range[1])
        y0_scale = 1 - (the_y_range[0]-int(the_y_range[0]))
        yEnd_scale =  (the_y_range[1]-int(the_y_range[1]))
        scale_line = np.ones((lasty-firsty+1))
        scale_line[0] = y0_scale
        scale_line[-1] = yEnd_scale
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lasty = lasty - 1
        a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a