Question

我有以下代码试图规范化m x n数组的值（它将用作神经网络的输入，其中m是训练样例的数量和{{ 1}}是特征的数量。）

但是，当我在脚本运行后检查解释器中的数组时，我发现值没有标准化;也就是说，它们仍然具有原始值。我想这是因为函数内部n变量的赋值只能在函数中看到。

我该如何进行规范化？或者我是否必须从normalize函数返回一个新数组？

array

Answer 1

如果要将数学运算应用于就地的numpy数组，只需使用标准的就地运算符+=，-=，/=等等。例如：

>>> def foo(a):
...     a += 10
... 
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> foo(a)
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

这些操作的就地版本启动速度稍快，特别是对于较大的阵列：

>>> def normalize_inplace(array, imin=-1, imax=1):
...         dmin = array.min()
...         dmax = array.max()
...         array -= dmin
...         array *= imax - imin
...         array /= dmax - dmin
...         array += imin
...     
>>> def normalize_copy(array, imin=-1, imax=1):
...         dmin = array.min()
...         dmax = array.max()
...         return imin + (imax - imin) * (array - dmin) / (dmax - dmin)
... 
>>> a = numpy.arange(10000, dtype='f')
>>> %timeit normalize_inplace(a)
10000 loops, best of 3: 144 us per loop
>>> %timeit normalize_copy(a)
10000 loops, best of 3: 146 us per loop
>>> a = numpy.arange(1000000, dtype='f')
>>> %timeit normalize_inplace(a)
100 loops, best of 3: 12.8 ms per loop
>>> %timeit normalize_copy(a)
100 loops, best of 3: 16.4 ms per loop

Answer 2

def normalize(array, imin = -1, imax = 1):
    """I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

    dmin = array.min()
    dmax = array.max()


    array -= dmin;
    array *= (imax - imin)
    array /= (dmax-dmin)
    array += imin

    print array[0]

Answer 3

这是一个技巧，它比其他有用的答案略胜一筹：

def normalize(array, imin = -1, imax = 1):
    """I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

    dmin = array.min()
    dmax = array.max()

    array[...] = imin + (imax - imin)*(array - dmin)/(dmax - dmin)

这里我们为视图array[...]分配值，而不是将这些值分配给函数范围内的一些新的局部变量。

x = np.arange(5, dtype='float')
print x
normalize(x)
print x

>>> [0. 1. 2. 3. 4.]
>>> [-1.  -0.5  0.   0.5  1. ]

编辑：

速度慢;它分配一个新的数组。但是，如果你在内置的就地操作繁琐或不够的情况下做一些更复杂的事情，这可能是有价值的。

def normalize2(array, imin=-1, imax=1):
    dmin = array.min()
    dmax = array.max()

    array -= dmin;
    array *= (imax - imin)
    array /= (dmax-dmin)
    array += imin

A = np.random.randn(200**3).reshape([200] * 3)
%timeit -n5 -r5 normalize(A)
%timeit -n5 -r5 normalize2(A)

>> 47.6 ms ± 678 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)
>> 26.1 ms ± 866 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)

Answer 4

There is a nice way to do in-place normalization when using numpy. np.vectorize is is very usefull when combined with a lambda function when applied to an array. See the example below:

import numpy as np

def normalizeMe(value,vmin,vmax):

    vnorm = float(value-vmin)/float(vmax-vmin)

    return vnorm

imin = 0
imax = 10
feature = np.random.randint(10, size=10)

# Vectorize your function (only need to do it once)
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax)) 
normfeature = temp(np.asarray(feature))

print feature
print normfeature

One can compare the performance with a generator expression, however there are likely many other ways to do this.

%%timeit
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax)) 
normfeature1 = temp(np.asarray(feature))
10000 loops, best of 3: 25.1 µs per loop


%%timeit
normfeature2 = [i for i in (normalizeMe(val,imin,imax) for val in feature)]
100000 loops, best of 3: 9.69 µs per loop

%%timeit
normalize(np.asarray(feature))
100000 loops, best of 3: 12.7 µs per loop

So vectorize is definitely not the fastest, but can be conveient in cases where performance is not as important.

Numpy修改数组到位？

4 个答案: