PyUnicode字符串和C字符串之间的字符串转换如何工作?

时间:2016-03-18 19:55:50

标签: python c python-3.x python-c-api python-internals

我有一个PyUnicode对象,我试图转换回C字符串(char *)。

我尝试这样做的方式似乎并不奏效。这是我的代码:

PyObject * objectCompName = PyTuple_GET_ITEM(compTuple, (Py_ssize_t) 0);
PyObject * ooCompName = PyUnicode_AsASCIIString(objectCompName);
char * compName = PyBytes_AsString(ooCompName);
Py_DECREF(ooCompName);

我应该采取另一种/更好的方式吗?

2 个答案:

答案 0 :(得分:5)

如果UTF-8编码char *没问题,你肯定应该使用PyUnicode_AsUTF8AndSize(需要Python 3.3):

PyObject * objectCompName = PySequence_GetItem(compTuple, 0);
if (! objectCompName) {
    return NULL;
}

size_t size;
char *ptr = PyUnicode_AsUTF8AndSize(objectCompName, &size);
if (!ptr) {
    return NULL;
}

// notice that the string pointed to by ptr is not guaranteed to stay forever,
// and you need to copy it, perhaps by `strdup`.

另外,请理解强制以检查您在代码中执行的每个Py*函数调用的返回值。

如果PyTuple_GetItem不是NULL,或compTuple导致tuple0将返回IndexError。如果PyUnicode_AsUTF8AndSize不是NULL对象,objectCompName将返回str。当条件合适时,忽略返回值并且CPython与SIGSEGV崩溃。

答案 1 :(得分:0)

您需要先将python PyUnicode转换为非unicode python字符串(在此处阅读更多内容:https://docs.python.org/2/c-api/unicode.html#ascii-codecs),然后您可以轻松地将结果转换为char*

以下是一个帮助您继续的伪代码:

// Assumption: you have a variable named "pyobj" which is
// a pointer to an instance of PyUnicodeObject.

PyObject* temp = PyUnicode_AsASCIIString(pyobj);
if (NULL == temp) {
    // Means the string can't be converted to ASCII, the codec failed
    printf("Oh noes\n");
    return;
}

// Get the actual bytes as a C string
char* c_str = PyByteArray_AsString(temp);

// Use the string in some manner
printf("The python unicode string is: %s\n", c_str);

// Make sure the temp stuff gets cleaned up at the end
Py_XDECREF(temp);