Question

我正在尝试将python字符串列表转换为2D字符数组，然后将其传递给C函数。

Python版本：3.6.4，Cython版本：0.28.3，操作系统Ubuntu 16.04

我的第一次尝试是这样的：

def my_function(name_list):
    cdef char name_array[50][30]

    for i in range(len(name_list)):
        name_array[i] = name_list[i]

代码已构建，但是在运行时我收到以下响应：

Traceback (most recent call last):
  File "test.py", line 532, in test_my_function
    my_function(name_list)
  File "my_module.pyx", line 817, in my_module.my_function
  File "stringsource", line 93, in 
carray.from_py.__Pyx_carray_from_py_char
IndexError: not enough values found during array assignment, expected 25, got 2

然后我尝试通过执行以下操作来确保赋值右侧的字符串正好是30个字符：

def my_function(name_list):
    cdef char name_array[50][30]

    for i in range(len(name_list)):
        name_array[i] = (name_list[i] + ' '*30)[:30]

这引起了另一个错误，如下所示：

Traceback (most recent call last):
  File "test.py", line 532, in test_my_function
    my_function(name_list)
  File "my_module.pyx", line 818, in my_module.my_function
  File "stringsource", line 87, in carray.from_py.__Pyx_carray_from_py_char
TypeError: an integer is required

我将不胜感激。谢谢。

Answer 1

感谢@ead的回复。它使我想到了可行的方法。我不认为这是最好的方法，但目前还可以。

按照@ead的建议，我通过添加空字符来解决空终止。

我收到一个TypeError: string argument without an encoding错误，并且在将字符串转换为字节数组之前不得不对其进行编码。这就是添加的.encode('ASCII')位的作用。

这是工作代码：

def my_function(name_list):
    cdef char name_array[50][30]

    for i in range(len(name_list)):
        name_array[i] = bytearray((name_list[i] + '\0'*30)[:30].encode('ASCII'))

Answer 2

我不喜欢Cython的这种功能，而且似乎至少没有经过深思熟虑的低谷：

使用char数组很方便，从而避免了分配/释放动态分配的内存的麻烦。但是，自然而然地，分配的缓冲区大于它所使用的字符串。强制相等的长度没有意义。
C字符串以空值结尾。并非总是需要\0，但通常是有必要的，因此需要一些额外的步骤来确保这一点。

因此，我将推出自己的解决方案：

%%cython
from libc.string cimport memcpy

cdef int from_str_to_chararray(source, char *dest, size_t N, bint ensure_nullterm) except -1:
    cdef size_t source_len = len(source) 
    cdef bytes as_bytes = source.encode('ascii')    #hold reference to the underlying byte-object
    cdef const char *as_ptr = <const char *>(as_bytes)
    if ensure_nullterm:
        source_len+=1
    if source_len > N:
        raise IndexError("destination array too small")
    memcpy(dest, as_ptr, source_len)
    return 0

，然后按以下方式使用它：

%%cython
def test(name):
    cdef char name_array[30]
    from_str_to_chararray(name, name_array, 30, 1)
    print("In array: ", name_array)

快速测试得出：

>>> tests("A")
In array: A
>>> test("A"*29)
In array: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>> test("A"*30)
IndexError: destination array too small

有关实现的其他说明：

有必要保留基础bytes对象的引用，以使其保持活动状态，否则as_ptr将在创建后立即变得悬而未决。
字节对象has a trailing \0的内部表示形式，因此即使memcpy(dest, as_ptr, source_len)，source_len=len(source)+1也是安全的。
except -1是必需的，因此该异常实际上已传递给Python代码/在其中进行了检查。

显然，并不是所有的事情都完美无缺：必须手动传递数组的大小，从长远来看，这将导致错误-Cython的版本可以自动解决这一问题。但是鉴于目前Cython版本缺少功能，因此我认为推出版本是更好的选择。

Cython：将Python字符串列表转换为2D字符数组

2 个答案: