Question

比方说，我有一维numpy数组X（功能）和Y（二进制类），以及一个函数f，它需要两片X和{{ 1}}并计算一个数字。

我还有一个索引Y数组，我需要通过它们拆分S和X。确保每个切片都不会为空。

所以我的代码如下：

在def f(x_left, y_left, x_right, y_right): n = x_left.shape[0] + x_right.shape[0] lcond = y_left == 1 rcond = y_right == 1 hleft = 1 - ((y_left[lcond].shape[0])**2 + (y_left[~lcond].shape[0])**2) / n**2 hright = 1 - ((y_right[rcond].shape[0])**2 + (y_right[~rcond].shape[0])**2) / n**2 return -(x_left.shape[0] / n) * hleft - (x_right.shape[0] / n) * hright results = np.empty(len(S)) for i in range(len(S)): results[i] = f(X[:S[i]], Y[:S[i]], X[S[i]:], Y[S[i]:])的每次拆分中，数组results必须包含f的结果。

S

我的问题是如何使用numpy以矢量化的方式执行计算，以使此代码更快？

Answer 1

首先，让我们让您的功能更加高效。您正在执行一些不必要的索引操作：您只需要edittext.setOnKeyListener(new View.OnKeyListener() { @Override public boolean onKey(View v, int keyCode, KeyEvent event) { if (event.getKeyCode() != KeyEvent.KEYCODE_DEL) { String input = edittext.getText().toString(); if (input.length() == 1) { input = input + "'"; edittext.setText(input); int pos = edittext.getText().length(); edittext.setSelection(pos); } } return false; } });或y_left[lcond].shape[0]而不是lcond.sum()来代替索引。

这是代码的改进的循环版本（带有虚拟输入）：

len(lcond.nonzero()[0])

更改非常简单。

现在，事实证明，我们确实可以向量化您的循环。为此，我们必须同时使用import numpy as np n = 1000 X = np.random.randint(0,n,n) Y = np.random.randint(0,n,n) S = np.random.choice(n//2, n) def f2(x, y, s): """Same loopy solution as original, only faster""" n = x.size isone = y == 1 lval = len(isone[:s].nonzero()[0]) rval = len(isone[s:].nonzero()[0]) hleft = 1 - (lval**2 + (s - lval)**2) / n**2 hright = 1 - (rval**2 + (n - s - rval)**2) / n**2 return - s / n * hleft - (n - s) / n * hright def time_newloop(): """Callable front-end for timing comparisons""" results = np.empty(len(S)) for i in range(len(S)): results[i] = f2(X, Y, S[i]) return results的每个元素进行比较。我们执行此操作的方法是创建形状为S（其中(nS, n)）的2d蒙版，该蒙版将这些值截止到S.size == nS的相应元素。方法如下：

定义原始解决方案以def f3(X, Y, S): """Vectorized solution working on all the data at the same time""" n = X.size leftmask = np.arange(n) < S[:,None] # boolean, shape (nS, n) rightmask = ~leftmask # boolean, shape (nS, n) isone = Y == 1 # shape (n,) lval = (isone & leftmask).sum(axis=1) # shape (nS,) rval = (isone & rightmask).sum(axis=1) # shape (nS,) hleft = 1 - (lval**2 + (S - lval)**2) / n**2 hright = 1 - (rval**2 + (n - S - rval)**2) / n**2 return - S / n * hleft - (n - S) / n * hright # shape (nS,) def time_vector(): """Trivial front-end for fair timing""" return f3(X,Y,S)的身份运行，我们可以检查结果是否相同：

time_orig()

以及具有上述随机输入的运行时：

>>> np.array_equal(time_orig(), time_newloop()), np.array_equal(time_orig(), time_vector())
(True, True)

这意味着上面的循环版本几乎是原始循环版本的两倍，而矢量化版本的速度又快了三倍。当然，后一种改进的代价是增加了内存需求：您现在有了形状为>>> %timeit time_orig() ... %timeit time_newloop() ... %timeit time_vector() ... ... 19 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 11.4 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.93 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)的数组，而不是形状为(n,)的数组，如果您的输入数组很大，该数组可能会变得很大。但是正如他们所说的那样，这里没有免费的午餐，通过向量化，您通常会在运行时换取内存。

以向量化方式对数组切片计算函数

1 个答案: