Question

我正在尝试运行嵌入在简单C程序中的Python。但是，当我导入模块时，出现错误undefined symbol: PyUnicodeUCS2_DecodeUTF8。

在进一步研究中，我发现以Py_Initialize();开始的Python解释器使用UCS-4编码，而我尝试导入的模块使用UCS-2编码。我在问是否有一种方法可以使用正确的编码来初始化Python解释器。我正在使用主要使用USC2的centos7 linux系统，但我不知道为什么嵌入式解释器使用USC-4

C代码：embed.c

#include <Python.h>
int main (int argc, char *argv[]) 
{
  Py_Initialize();
  pName = PyString_FromString(argv[1]); //get name of module to import
  pModule = PyImport_Import(pName);
}

Python

print( __file__ + ": Encoding: " + str(sys.maxunicode)) #How I printed out the interpreter encoding which is 1114111
import torch

Makefile

gcc -I /usr/include/python2.7 embed.c -o embed -lpython2.7

代码已编译，但出现以下错误消息：undefined symbol: PyUnicodeUCS2_DecodeUTF8。

Answer 1

无法使用正确的编码来初始化解释器。解释器使用UCS2还是UCS4是编译时的选择。您需要做的是从源代码重新编译整个模块。如果您没有该模块的源代码，那么必须从源代码编译 Python 2.7 ，并注意不要用它替换系统python 2.7。

UCS2版本被认为是错误的，因为在那里非BMP字符将被表示为UTF-16代理对，现在可以作为单独的代码点看到。这就是为什么Python 3没有此编译时选项的原因，因为它始终在内部使用UCS4来处理无法在UCS2中表示的字符串。

在USC-2下使用C启动Python解释器

1 个答案: