考虑以下人为设计的Cython函数以连接字符串列表:
# cython: language_level=3
cpdef test_join():
""" ["abc", "def", "ghi"] -> "abcdefghi" """
cdef:
list lines = ["abc", "def", "ghi"]
char* out = ""
char* line = ""
int i
for i in range(len(lines)):
line = lines[i]
out = out + line
return out
它将无法编译,并显示以下错误:
存储临时Python参考的不安全C派生
我假设这与line
类型为char*
并不断重新分配有关。我已经看到了similar question的答案,但是对于这个基本示例,尚无法修改该答案。 (而且还涉及大量我不熟悉的C-API。)
如何修改上述函数以按预期进行编译和返回?
更广泛地说,我想更好地理解此错误。提交37e4a20有一些解释:
从临时Python字符串对象中获取
char*
...仅在将此类指针分配给变量并因此超过字符串本身的生存期时,才会引发编译时错误。
更新:为了进一步简化操作,看起来是导致问题的分配:
cpdef int will_succeed():
cdef char* a = b"hello"
cdef char* b = b" world"
print(a + b) # no new assignment
return 1
cpdef will_fail():
cdef char* a = b"hello"
cdef char* b = b" world"
a = a + b # won't compile
return a
我怀疑使用string.pxd
/ string.h
中的内容可能有更合适的方法,但是我在C内存管理和效率方面很弱:
from libc.string cimport strcat, strcpy
cpdef use_strcat():
cdef char out[1024]
strcpy(out, b"")
cdef char* a = b"hello"
cdef char* b = b" world"
strcat(out, a)
strcat(out, b)
return out
答案 0 :(得分:2)
我认为问题出在
out = out + line
Cython没有为C字符串定义运算符+
。相反,它将它们转换为Python字符串并将其串联起来:
tmp1 = str(out)
tmp2 = str(line)
tmp3 = tmp1 + tmp2
out = get_c_string_from(tmp3)
out
因此会在tmp3
被销毁后立即变为无效指针。
我会避免使用strcat
,因为它是not very efficient for repeated uses。而是跟踪当前的字符串长度并自己复制数据。假设您的长度未知,您可能想用malloc
分配字符串(在这种情况下,您有责任释放它)
from libc.stdlib cimport free, malloc, realloc
from libc.string cimport memcpy
from cython import Py_ssize_t
cdef char *line
cdef Py_ssize_t i
cdef Py_ssize_t length = 0
cdef Py_ssize_t incrlength
cdef char *out = <char *>malloc(1) # Reallocate as needed
try:
out[0] = b'\x00' # keep C-strings null-terminated
for i in range(len(lines)):
line = lines[i]
incrlength = len(line)
out = <char *>realloc(out, length + incrlength + 1)
memcpy(out + length, line, incrlength)
length += incrlength
out[length] = '\x00' # keep C-strings null-terminated
return out # autoconversion back to a Python string
finally:
free(out)
这是我认为您应该做的事情的粗略概述,并且未经实际测试。