Question

我正在阅读Python的{{1}}库的源代码，并找到了以下代码段。它似乎对向量（numpy）执行逐元素操作。例如，numpy.ndarray将获得结果numpy.multiply([1,2,3],[4,5,6])

[4,10,18]

对我来说这看起来很奇怪，尤其是#define BASE_UNARY_LOOP(tin, tout, op) \ UNARY_LOOP { \ const tin in = *(tin *)ip1; \ tout * out = (tout *)op1; \ op; \ } #define UNARY_LOOP_FAST(tin, tout, op) \ do { \ /* condition allows compiler to optimize the generic macro */ \ if (IS_UNARY_CONT(tin, tout)) { \ if (args[0] == args[1]) { \ BASE_UNARY_LOOP(tin, tout, op) \ } \ else { \ BASE_UNARY_LOOP(tin, tout, op) \ } \ } \ else { \ BASE_UNARY_LOOP(tin, tout, op) \ } \ } \ while (0)内的评论。通过使用UNARY_LOOP_FAST逻辑进行优化，这里发生了什么？

Answer 1

如果没有更多上下文，您无法说明numpy代码段可以实现哪种优化，但它可能类似于此简化示例：

#define LOOP(op) for (int i = 0; i < n; i++) op

void f(int *a, int *b, int n, int c) {
  if (c == 1) {
    LOOP(a[i] += b[i] * c);
  }
  else {
    LOOP(a[i] += b[i] * c);
  }
}

现代编译器可以eliminate the multiplication in the first branch。在上面的示例中，您可以在第一个分支中简单地编写LOOP(a[i] += b[i])，但如果if语句是以op作为参数的另一个宏的一部分，那么这是不可能的

基本思想是强制编译器为多个路径生成代码，其中一些路径具有可用于某些优化的前提条件。

Answer 2

此剪辑来自

https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/loops.c.src

此特定剪辑为单个参数ufunc定义循环宏，类似于np.abs。

此剪辑前的评论为

带有连续特化的
/ * *循环* op应该是tin in上的代码和*将结果存储在tout * out *中与NPY_GCC_OPT_3结合以允许自动向量化*只应在值得使用的地方使用避免代码膨胀* /

ufunc设计允许使用np.sin(a, out=b)。显然，它告诉编译器考虑tout数组与tin相同的特殊情况，例如np.sin(a, out=a)。

类似地，快速二进制ufunc宏允许三个数组np.add(a, b, out=c)之间的身份，可以将其实现为c=a+b，a += b，b+=a。

这些时间差异表明在args[0] == args[1]

的情况下存在适度的优化

In [195]: a=np.ones((100,100))
In [197]: %%timeit b=np.ones((100,100))
     ...: np.sin(a, out=b)
1000 loops, best of 3: 343 µs per loop

In [198]: %%timeit b=np.ones((100,100))
     ...: np.sin(b, out=b)
1000 loops, best of 3: 279 µs per loop

多余的if-else如何帮助优化？

2 个答案: