Question

我对C ++ protobuf中重复双字段的序列化有一个奇怪的问题。为了练习，我选择了时间序列数据并尝试在我的应用程序中序列化/反序列化。我在一个.cpp文件中重现了错误（参见完整要点），这里阅读编写protobuf文件的核心思想，从示例中得到了它：

void writeMessage(::google::protobuf::Message &message) {
    google::protobuf::uint32 size = message.ByteSize();
    char buffer[size]; 
    if(!message.SerializeToArray(buffer, size)) {
        cerr << "Failed to serialize message: \n" << message.DebugString();
        terminate();
    }
    codedOut->WriteVarint32(size);
    codedOut->WriteRaw(buffer, size);
}
bool readMessage(::google::protobuf::Message &message) {
    google::protobuf::uint32 size;
    if (!codedIn->ReadVarint32(&size)) { 
        return false;
    }
    char buffer[size];

    if(!codedIn->ReadRaw(buffer, size)) {
        cerr << "Can't do ReadRaw of message size " << size << "\n";
        terminate();
    }
    message.ParseFromArray(buffer, size);
    return true;
}

对于1-20消息，它工作正常，但如果我尝试读取50或更多，则最后一条消息将被破坏 - ReadRaw将返回false。如果我尝试忽略ReadRaw返回，那么消息将包含具有遗漏值和空值的重复字段数组。序列化阶段应该没问题，我已经检查了所有内容。

请你说，我做错了吗？

你可以从这里获得全部要点： https://gist.github.com/alexeyche/d6af8a43d346edc12868

重现您只需要执行的错误：

protoc -I. --cpp_out=. ./time_series.proto
g++ main.cpp time_series.pb.cc -std=c++11 -L/usr/local/lib -lprotobuf -I/usr/local/include
./a.out synthetic_control_TRAIN out.pb

synthetic_control_TRAIN文件带有时间序列，你可以从这里得到 https://yadi.sk/d/gxZy8JSvcjiVD

我的系统：g ++ 4.8.1，ubuntu 12.04，libprotobuf 2.6.1

Answer 1

您的数据有多大？为了安全起见，CodedInputStream默认为64MiB的限制，之后它将拒绝读取更多数据。您可以使用CodedInputStream::SetTotalBytesLimit()来增加限制，但更好的解决方案是使用新的CodedInputStream简单地阅读每封邮件。这个类很快构造和销毁，所以只需在堆栈上分配它，读取一条消息，然后让它超出范围。（不要重新分配基础ZeroCopyInputStream。）

顺便说一句，看起来你似乎试图模仿Protobuf-Java中存在的parseDelimitedFrom()格式而不是Protobuf-C ++，但是你编写的代码效率不高：你＆＃39;重新制作堆栈上每条消息的不必要副本。考虑使用my code from this StackOverflow answer。

Protobuf重复字段反序列化

1 个答案: