使用`numpy.newaxis`

Question

我有一个numpy矩阵A，其中数据被组织为column-vector-vise，即A[:,0]是第一个数据向量，A[:,1]是第二个，依此类推。我想知道是否有一种更优雅的方法可以将这些数据归零。我目前正在通过for循环执行此操作：

mean=A.mean(axis=1)
for k in range(A.shape[1]):
    A[:,k]=A[:,k]-mean

numpy是否提供了执行此操作的功能？或者可以用另一种方式更有效地完成它？

Answer 1

通常情况下，您可以通过多种方式执行此操作。下面的每种方法都可以通过向mean向量添加维度，使其成为4 x 1数组，然后NumPy的广播处理其余的。每种方法都会创建mean的视图，而不是深层副本。第一种方法（即使用newaxis）可能是大多数人喜欢的，但其他方法也包含在记录中。

除了以下方法之外，另请参阅ovgolovin's answer，它使用NumPy矩阵来避免重新整形mean。

对于下面的方法，我们从以下代码和示例数组A开始。

import numpy as np

A = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
mean = A.mean(axis=1)

使用`numpy.newaxis`

>>> A - mean[:, np.newaxis]
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

使用`None`

documentation表示可以使用None代替newaxis。这是因为

>>> np.newaxis is None
True

因此，以下完成任务。

>>> A - mean[:, None]
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

尽管如此，newaxis更清楚，应该是首选。此外，可以证明newaxis更具未来性。另见：Numpy: Should I use newaxis or None?

使用`ndarray.reshape`

>>> A - mean.reshape((mean.shape[0]), 1)
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

直接更改`ndarray.shape`

您也可以直接更改mean的形状。

>>> mean.shape = (mean.shape[0], 1)
>>> A - mean
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

Answer 2

您也可以使用matrix代替array。然后你不需要重塑：

>>> A = np.matrix([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
>>> m = A.mean(axis=1)
>>> A - m
matrix([[-1.,  0.,  1.],
        [-1.,  0.,  1.],
        [-1.,  0.,  1.],
        [-1.,  0.,  1.]])

Answer 3

是。 pylab.demean：

In [1]: X = scipy.rand(2,3)

In [2]: X.mean(axis=1)
Out[2]: array([ 0.42654669,  0.65216704])

In [3]: Y = pylab.demean(X, axis=1)

In [4]: Y.mean(axis=1)
Out[4]: array([  1.85037171e-17,   0.00000000e+00])

来源：

In [5]: pylab.demean??
Type:           function
Base Class:     <type 'function'>
String Form:    <function demean at 0x38492a8>
Namespace:      Interactive
File:           /usr/lib/pymodules/python2.7/matplotlib/mlab.py
Definition:     pylab.demean(x, axis=0)
Source:
def demean(x, axis=0):
    "Return x minus its mean along the specified axis"
    x = np.asarray(x)
    if axis == 0 or axis is None or x.ndim <= 1:
        return x - x.mean(axis)
    ind = [slice(None)] * x.ndim
    ind[axis] = np.newaxis
    return x - x.mean(axis)[ind]

Answer 4

看起来这些答案中的一些已经很老了，我刚刚在numpy 1.13.3上测试了这个：

>>> import numpy as np
>>> a = np.array([[1,1,3],[1,0,4],[1,2,2]])
>>> a
array([[1, 1, 3],
       [1, 0, 4],
       [1, 2, 2]])
>>> a = a - a.mean(axis=0)
>>> a
array([[ 0.,  0.,  0.],
       [ 0., -1.,  1.],
       [ 0.,  1., -1.]])

我认为这更清洁，更简单。试试让我知道这是否比其他答案更低劣。

从numpy矩阵中删除均值

4 个答案:

使用`numpy.newaxis`

使用`None`

使用`ndarray.reshape`

直接更改`ndarray.shape`

从numpy矩阵中删除均值

4 个答案:

使用numpy.newaxis

使用None

使用ndarray.reshape

直接更改ndarray.shape

使用`numpy.newaxis`

使用`None`

使用`ndarray.reshape`

直接更改`ndarray.shape`