计算幅度

Question

如何在NumPy中优化地规范化向量列表？

以下是不工作的示例：

from numpy import *

vectors = array([arange(10), arange(10)])  # All x's, then all y's
norms = apply_along_axis(linalg.norm, 0, vectors)

# Now, what I was expecting would work:
print vectors.T / norms  # vectors.T has 10 elements, as does norms, but this does not work

最后一个操作产生“形状不匹配：对象无法广播到单个形状”。

如何使用NumPy优雅地完成vectors中的2D矢量的归一化？

修改：为norms添加维度时，为什么上述操作不起作用（根据我在下面的回答）？

Answer 1

计算幅度

我遇到了这个问题，并对你的规范化方法感到好奇。我使用不同的方法来计算幅度。 注意：我通常还会计算最后一个索引的规范（在这种情况下是行，而不是列）。

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

然而，通常情况下，我只是这样规范化：

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

时间比较

我进行了测试以比较时间，发现我的方法相当快，但Freddie Witherdon的建议更快。

import numpy as np    
vectors = np.random.rand(100, 25)

# OP's
%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)
# Output: 100 loops, best of 3: 2.39 ms per loop

# Mine
%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
# Output: 10000 loops, best of 3: 13.8 us per loop

# Freddie's (from comment below)
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 10000 loops, best of 3: 6.45 us per loop

请注意，正如StackOverflow answer注意到，einsum没有进行一些安全检查，因此您应该确保dtype vectors足以满足{{1}}准确地存储大小的平方。

Answer 2

好吧，除非我错过了什么，否则这确实有效：

vectors / norms

你的建议中的问题是广播规则。

vectors  # shape 2, 10
norms  # shape 10

形状长度不一样！因此规则是首先在左侧

上将小形状延长一个

norms  # shape 1,10

您可以通过以下方式手动执行此操作：

vectors / norms.reshape(1,-1)  # same as vectors/norms

如果您想要计算vectors.T/norms，则必须手动进行重新整形，如下所示：

vectors.T / norms.reshape(-1,1)  # this works

Answer 3

好吧：NumPy的阵列形状广播为阵列形状的左增加了尺寸，而不是右边。但是，可以指示NumPy在norms数组的右侧添加维度：

print vectors.T / norms[:, newaxis]

确实有用！

Answer 4

scikit中已经有一个函数学习：

import sklearn.preprocessing as preprocessing
norm =preprocessing.normalize(m, norm='l2')*

更多信息：

http://scikit-learn.org/stable/modules/preprocessing.html

Answer 5

我对矢量标准化的首选方法是使用numpy的inner1d来计算它们的大小。到目前为止，这是与inner1d

相比的建议

import numpy as np
from numpy.core.umath_tests import inner1d
COUNT = 10**6 # 1 million points

points = np.random.random_sample((COUNT,3,))
A      = np.sqrt(np.einsum('...i,...i', points, points))
B      = np.apply_along_axis(np.linalg.norm, 1, points)   
C      = np.sqrt((points ** 2).sum(-1))
D      = np.sqrt((points*points).sum(axis=1))
E      = np.sqrt(inner1d(points,points))

print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]

使用cProfile测试性能：

import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds

inner1d计算头发的速度比einsum快。所以使用inner1d来规范化：

n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds

针对scikit进行测试：

import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True

结论：使用inner1d似乎是最好的选择

Answer 6

对于二维情况，使用np.hypot(vectors[:,0],vectors[:,1])似乎比房地美的np.sqrt(np.einsum('...i,...i', vectors, vectors))更快。（引用杰夫的答案）

import numpy as np

# Generate array of 2D vectors.
vectors = np.random.random((1000,2))

# Using Freddie's
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 11.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Using numpy.hypot()
%timeit np.hypot(vectors[:,0], vectors[:,1])
# Output: 6.81 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

要获取归一化向量，请执行以下操作：

vectors /= np.hypot(vectors[:,0], vectors[:,1])

NumPy：如何快速标准化许多载体？

6 个答案:

计算幅度

时间比较