Question

[这篇文章的早期版本绝对没有回应，所以，如果这是由于缺乏清晰度，我已经重新设计了它，附加解释和代码注释。]

我想计算numpy n - 维数组元素的平均值和标准差，这些数组不对应于单个轴（而是 k ＆gt; 1 非连续轴），并以新的（ n - k + 1）维数组收集结果。

numpy是否包含有效执行此操作的标准构造？

下面复制的函数mu_sigma是我解决此问题的最佳尝试，但它有两个明显的低效率：1）它需要复制原始数据; 2）它计算两次平均值（因为标准差的计算需要计算均值）。

mu_sigma函数有两个参数：box和axes。 box是 n - 维度numpy数组（又名“ndarray”），axes是 k - 整数元组，代表（不是box的维度必然是连续的。该函数返回一个新的（ n - k + 1） - 维度ndarray，其中包含由box表示的“hyperslabs”计算的均值和标准差。 k 指定轴。

以下代码还包含mu_sigma实例的示例。在此示例中，box参数是浮点数的4 x 2 x 4 x 3 x 4 ndarray，axes参数是元组（1,3）。（因此，我们 n == len(box.shape) == 5， k == len(axes) == 2.）结果（在此处）我将调用outbox）返回此示例输入是一个4 x 4 x 4 x 2 ndarray浮点数。对于每个索引的三元组 i ， k ， j （其中每个索引的范围超过集合{0,1,2,3}），元素outbox[i, j, k, 0]是numpy表达式box[i, 0:2, j, 0:3, k]指定的6个元素的平均值。同样，outbox[i, j, k, 1]是相同6个元素的标准差。这意味着结果范围的第一个 n - k == 3维度与 n - k相同的索引输入ndarray box的非轴尺寸，在本例中为尺寸0,2和4。

mu_sigma中使用的策略是

置换维度（使用transpose方法），以便函数第二个参数中指定的轴全部放在最后;其余（非轴）尺寸保留在开头（按原始顺序排列）;
将轴尺寸折叠为一个（使用reshape方法）;新的“折叠”维度现在是重塑的ndarray的最后一个维度;
使用最后一个“折叠”维度作为轴计算均值的ndarray;
使用最后一个“折叠”维度作为轴计算标准差的ndarray;
返回从连接（3）和（4）中产生的ndarray获得的ndarray

import numpy as np

def mu_sigma(box, axes):
    inshape = box.shape

    # determine the permutation needed to put all the dimensions given in axes
    # at the end (otherwise preserving the relative ordering of the dimensions)
    nonaxes = tuple([i for i in range(len(inshape)) if i not in set(axes)])

    # permute the dimensions
    permuted = box.transpose(nonaxes + axes)

    # determine the shape of the ndarray after permuting the dimensions and
    # collapsing the axes-dimensions; thanks to Bago for the "+ (-1,)"
    newshape = tuple(inshape[i] for i in nonaxes) + (-1,)

    # collapse the axes-dimensions
    # NB: the next line results in copying the input array
    reshaped = permuted.reshape(newshape)

    # determine the shape for the mean and std ndarrays, as required by
    # the subsequent call to np.concatenate (this reshaping is not necessary
    # if the available mean and std methods support the keepdims keyword;
    # instead, just set keepdims to True in both calls).
    outshape = newshape[:-1] + (1,)

    # compute the means and standard deviations
    mean = reshaped.mean(axis=-1).reshape(outshape)
    std = reshaped.std(axis=-1).reshape(outshape)

    # collect the results in a single ndarray, and return it
    return np.concatenate((mean, std), axis=-1)

inshape = 4, 2, 4, 3, 4
inbuf = np.array(map(float, range(np.product(inshape))))
inbox = np.ndarray(inshape, buffer=inbuf)
outbox = mu_sigma(inbox, tuple(range(len(inshape))[1::2]))

# "inline tests"
assert all(outbox[..., 1].ravel() ==
           [inbox[0, :, 0, :, 0].std()] * outbox[..., 1].size)
assert all(outbox[..., 0].ravel() == [float(4*(v + 3*w) + x)
                                      for v in [8*y - 1
                                                for y in [3*z + 1
                                                          for z in range(4)]]
                                      for w in range(4)
                                      for x in range(4)])

Answer 1

从numpy 2.0开始看起来有点容易。

http://projects.scipy.org/numpy/ticket/1234

numpy：计算多个非连续轴的平均值和标准值（第二次尝试）

1 个答案: