存储临时Python参考的不安全C派生

时间:2018-12-01 17:46:05

标签: cython

考虑以下人为设计的Cython函数以连接字符串列表:

# cython: language_level=3
cpdef test_join():
    """ ["abc", "def", "ghi"] -> "abcdefghi" """
    cdef:
        list lines = ["abc", "def", "ghi"]
        char* out = ""
        char* line = ""
        int i
    for i in range(len(lines)):
        line = lines[i]
        out = out + line
    return out

它将无法编译,并显示以下错误:

  

存储临时Python参考的不安全C派生

我假设这与line类型为char*并不断重新分配有关。我已经看到了similar question的答案,但是对于这个基本示例,尚无法修改该答案。 (而且还涉及大量我不熟悉的C-API。)

如何修改上述函数以按预期进行编译和返回?


更广泛地说,我想更好地理解此错误。提交37e4a20有一些解释:

  

从临时Python字符串对象中获取char* ...仅在将此类指针分配给变量并因此超过字符串本身的生存期时,才会引发编译时错误。


更新:为了进一步简化操作,看起来是导致问题的分配:

cpdef int will_succeed():
    cdef char* a = b"hello"
    cdef char* b = b" world"
    print(a + b)  # no new assignment
    return 1

cpdef will_fail():
    cdef char* a = b"hello"
    cdef char* b = b" world"
    a = a + b  # won't compile
    return a

我怀疑使用string.pxd / string.h中的内容可能有更合适的方法,但是我在C内存管理和效率方面很弱:

from libc.string cimport strcat, strcpy

cpdef use_strcat():
    cdef char out[1024]
    strcpy(out, b"")

    cdef char* a = b"hello"
    cdef char* b = b" world"

    strcat(out, a)
    strcat(out, b)
    return out

1 个答案:

答案 0 :(得分:2)

我认为问题出在

out = out + line

Cython没有为C字符串定义运算符+。相反,它将它们转换为Python字符串并将其串联起来:

tmp1 = str(out)
tmp2 = str(line)
tmp3 = tmp1 + tmp2
out = get_c_string_from(tmp3)

out因此会在tmp3被销毁后立即变为无效指针。


我会避免使用strcat,因为它是not very efficient for repeated uses。而是跟踪当前的字符串长度并自己复制数据。假设您的长度未知,您可能想用malloc分配字符串(在这种情况下,您有责任释放它)

from libc.stdlib cimport free, malloc, realloc
from libc.string cimport memcpy

from cython import Py_ssize_t

cdef char         *line
cdef Py_ssize_t   i
cdef Py_ssize_t   length = 0
cdef Py_ssize_t   incrlength
cdef char         *out = <char *>malloc(1)  # Reallocate as needed

try:
    out[0] = b'\x00' # keep C-strings null-terminated
    for i in range(len(lines)):
        line = lines[i]
        incrlength = len(line)
        out = <char *>realloc(out, length + incrlength + 1)
        memcpy(out + length, line, incrlength)
        length += incrlength
        out[length] = '\x00'  # keep C-strings null-terminated
    return out  # autoconversion back to a Python string

finally:
   free(out)

这是我认为您应该做的事情的粗略概述,并且未经实际测试。