Question

我们决定将用C ++编写的IPC（进程间通信）模块之一暴露给python（我知道，这不是最聪明的主意）。我们使用可以与std::string进行串行化和反序列化的数据包（其行为类似于协议缓冲区，只是效率不高），因此我们的IPC类也返回并接受std::string。

将该类暴露给python的问题是std::string c ++类型被转换为str python类型，并且如果返回的std::string由无法解码为{的字符组成{1}}（大多数情况下），我得到了UTF-8异常。

我设法找到了两个解决方法（甚至是“解决方案”？），但是我对其中的任何一个都不满意。

这是我的C ++代码，用于重现UnicodeDecodeError问题并尝试解决方案：

UnicodeDecodeError

可以使用/* * boost::python string problem */ #include <iostream> #include <string> #include <vector> #include <boost/python.hpp> #include <boost/python/suite/indexing/vector_indexing_suite.hpp> struct Packet { std::string serialize() const { char buff[sizeof(x_) + sizeof(y_)]; std::memcpy(buff, &x_, sizeof(x_)); std::memcpy(buff + sizeof(x_), &y_, sizeof(y_)); return std::string(buff, sizeof(buff)); } bool deserialize(const std::string& buff) { if (buff.size() != sizeof(x_) + sizeof(y_)) { return false; } std::memcpy(&x_, buff.c_str(), sizeof(x_)); std::memcpy(&y_, buff.c_str() + sizeof(x_), sizeof(y_)); return true; } // whatever ... int x_; float y_; }; class CommunicationPoint { public: std::string read() { // in my production code I read that std::string from the other communication point of course Packet p; p.x_ = 999; p.y_ = 1234.5678; return p.serialize(); } std::vector<uint8_t> readV2() { Packet p; p.x_ = 999; p.y_ = 1234.5678; std::string buff = p.serialize(); std::vector<uint8_t> result; std::copy(buff.begin(), buff.end(), std::back_inserter(result)); return result; } boost::python::object readV3() { Packet p; p.x_ = 999; p.y_ = 1234.5678; std::string serialized = p.serialize(); char* buff = new char[serialized.size()]; // here valgrind detects leak std::copy(serialized.begin(), serialized.end(), buff); PyObject* py_buf = PyMemoryView_FromMemory( buff, serialized.size(), PyBUF_READ); auto retval = boost::python::object(boost::python::handle<>(py_buf)); //delete[] buff; // if I execute delete[] I get garbage in python return retval; } }; BOOST_PYTHON_MODULE(UtfProblem) { boost::python::class_<std::vector<uint8_t> >("UintVec") .def(boost::python::vector_indexing_suite<std::vector<uint8_t> >()); boost::python::class_<CommunicationPoint>("CommunicationPoint") .def("read", &CommunicationPoint::read) .def("readV2", &CommunicationPoint::readV2) .def("readV3", &CommunicationPoint::readV3); }进行编译（在生产中，我们当然使用CMake）。

这是一个简短的python脚本，用于加载我的库并解码数字：

g++ -g -fPIC -shared -o UtfProblem.so -lboost_python-py35 -I/usr/include/python3.5m/ UtfProblem.cpp

在第一个解决方法中，我返回import UtfProblem import struct cp = UtfProblem.CommunicationPoint() #cp.read() # exception result = cp.readV2() # result is UintVec type, so I need to convert it to bytes first intVal = struct.unpack('i', bytes([x for x in result[0:4]])) floatVal = struct.unpack('f', bytes([x for x in result[4:8]])) print('intVal: {} floatVal: {}'.format(intVal, floatVal)) result = cp.readV3().tobytes() intVal = struct.unpack('i', result[0:4]) floatVal = struct.unpack('f', result[4:8]) print('intVal: {} floatVal: {}'.format(intVal, floatVal))而不是返回std::string。它可以正常工作，但是我不喜欢它迫使我公开其他人工python类型std::vector<unit8_t>的事实，该类型对转换为python UintVec没有任何本机支持。

第二种解决方法很好，因为它将序列化的数据包公开为内存块，并原生支持转换为字节，但是会泄漏内存。我使用valgrind：bytes验证了内存泄漏，除了大量从python库读取的无效读（可能是误报）之外，它还显示了我

1个块中的8个字节肯定会丢失

我为缓冲区分配内存时的

行。如果在从函数返回之前删除内存，我将在python中得到一些垃圾。

问题：

如何适当地将序列化数据公开给python？用C ++表示字节数组时，我们通常使用valgrind --suppressions=../valgrind-python.supp --leak-check=yes -v --log-file=valgrindLog.valgrind python3 UtfProblem.py或std::string，不幸的是，它们不能很好地移植到python。

如果我的第二个解决方法对您来说还可以，那么如何避免内存泄漏？

如果将返回值暴露为const char*通常是可以的，那么如何避免使用std::string？

其他信息：

g ++（Debian 6.3.0-18 + deb9u1）6.3.0 20170516
Python 3.5.3
提升1.62

Answer 1

根据AntiMatterDynamite注释，使用Python API返回pythonic bytes对象非常正常：

PyObject* read() {
    Packet p;
    p.x_ = 999;
    p.y_ = 1234.5678;
    std::string buff = p.serialize();
    return PyBytes_FromStringAndSize(buff.c_str(), buff.size());
}

Answer 2

我建议您在C ++中定义自己的返回类型类，并使用Boost Python公开它。例如，您可以让它实现缓冲协议。然后，您将有一个常规的C ++析构函数，该析构函数将在适当的时候被调用-您甚至可以在类内部使用智能指针来管理分配的内存的寿命。

完成此操作后，下一个问题将是：为什么不让返回的对象公开属性以访问字段，而不使调用者使用struct.unpack()？那么您的调用代码可能会更简单：

result = cp.readV5()
print('intVal: {} floatVal: {}'.format(result.x, result.y))

如何使用boost_python

问题：

2 个答案: