Question

我遇到了一些有趣的内存行为，使用numpy + cython，同时尝试从numpy数组中获取数据作为C数组，以便在无GIL函数中使用。我已经看过cython和numpy的数组API，但我还没有找到任何解释。请考虑以下代码行：

cdef np.float32_t *a1 = <np.float32_t *>np.PyArray_DATA(np.empty(2, dtype="float32"))
print "{0:x}".format(<unsigned int>a1)
cdef np.float32_t *a2 = <np.float32_t *>np.PyArray_DATA(np.empty(2, dtype="float32"))
print "{0:x}".format(<unsigned int>a2)[]

我使用numpy的空函数分配两个numpy数组，并希望为每个数组缓冲区指向数据缓冲区。您可能希望这两个指针指向堆上的两个不同的内存地址，可能间隔为2 * 4个字节。但不，我得到指向相同内存地址的指针，例如

>>>96a7aec0
>>>96a7aec0

为什么？我设法通过在PyArray_DATA调用之外声明我的numpy数组来解决这个问题，在这种情况下，我得到了我期望的结果。

我能想到的唯一解释是，我没有在PyArray_DATA函数的范围之外创建任何Python对象，并且调用此函数并不会增加Python的引用计数。因此，GC会立即回收此内存空间，并且下一个数组将在当前空闲的先前内存地址处分配。有人比我更精通的人可以确认或给出另一种解释吗？

Answer 1

您创建了两个临时numpy数组，它们恰好位于同一地址。由于没有为它们保留python引用，它们会立即被垃圾收集，a1和a2也成为悬空指针。

如果为他们保留了参考资料，他们的地址可能不一样，例如：

cdef int[:] a = np.arange(10)  # A memoryview will keep the numpy array from GC.
cdef int[:] b = np.arange(10)
cdef int* a_ptr = &a[0]
cdef int* b_ptr = &b[0]
print(<size_t>a_ptr)
print(<size_t>b_ptr)

使用对象的基础数据时必须小心谨慎。如果使用不当，通常会遇到悬空指针。例如：

void cfunc(const char*)
# Fortunately, this won't compile in cython. 
# Error: Storing unsafe C derivative of temporary Python reference
cdef const char* = ("won't" + " compile").encode()
cfunc(char)

正确的方式：

# make sure keep_me is alive before cfunc have finished with it.
cdef bytes keep_me = ("right" + "way").encode() 
cfunc(temp)
# Or for single use.
cfunc(("right" + "way").encode())

c ++ std::string成员c_str()中的另一个例子：

// The result of `+` will immediately destructed. cfunc got a  dangling pointer.
const char * s = (string("not") + string("good")).c_str();
cfunc(s);

正确的方式：

// keep `keep_me` for later use.
string keep_me = string("right") + string("way"); 
cfunc(keep_me.c_str());
// Or, for single use.
cfunc((string("right") + string("way")).c_str())

参考：std::string::c_str() and temporaries

在相同地址分配的数组Cython + Numpy

1 个答案: