Question

我正在寻找选择满足多个条件的numpy数组元素的最快方法。举个例子，假设我想从数组中选择介于0.2和0.8之间的所有元素。我通常做这样的事情：

the_array = np.random.random(100000)
idx = (the_array > 0.2) * (the_array < 0.8)
selected_elements = the_array[idx]

但是，这会创建两个与the_array大小相同的其他数组（一个用于the_array＆gt; 0.2，另一个用于the_array＆lt; 0.8）。如果数组很大，这会占用大量内存。有没有办法解决这个问题？所有内置的numpy函数（例如logical_and）似乎都在幕后做同样的事情。

Answer 1

您可以为select实现自定义C调用。最基本的方法是通过ctypes实现。

<强> select.c

int select(float lower, float upper, float* in, float* out, int n)
{
  int ii;
  int outcount = 0;
  float val;
  for (ii=0;ii<n;ii++)
    {
      val = in[ii];
      if ((val>lower) && (val<upper))
        {
          out[outcount] = val;
          outcount++;
        }
    }
  return outcount;
}

编译为：

gcc -lm -shared select.c -o lib.so

在python方面：

<强> select.py

import ctypes as C
from numpy.ctypeslib import as_ctypes
import numpy as np

# open the library in python
lib = C.CDLL("./lib.so")

# explicitly tell ctypes the argument and return types of the function
pfloat = C.POINTER(C.c_float)
lib.select.argtypes = [C.c_float,C.c_float,pfloat,pfloat,C.c_int]
lib.select.restype = C.c_int

size = 1000000

# create numpy arrays
np_input  = np.random.random(size).astype(np.float32)
np_output = np.empty(size).astype(np.float32)

# expose the array contents to ctypes
ctypes_input = as_ctypes(np_input)
ctypes_output = as_ctypes(np_output)

# call the function and get the number of selected points
outcount = lib.select(0.2,0.8,ctypes_input,ctypes_output,size)

# select those points 
selected = np_output[:outcount]

不要期望通过这样的vanilla实现进行疯狂加速，但是在C方面，你可以选择添加OpenMP pragma来获得快速而肮脏的并行性，这可能会给你带来显着的提升。

同样如评论中所述，numexpr可能是一种更快捷的方式，可以在几行内完成所有这些。

使用多个条件从numpy数组中有效地选择元素

1 个答案: