Question

我有一个在intel Edison（32位Yocto Linux）上运行的程序。它读取传感器数据，然后将传感器数据写入文件。数据包含1个int和13个双包，每秒有100个包。一段时间后，我将从文件中删除文件，并使用在x64 windows机器上运行的工具读取这些文件。

目前我正在将数据写为原始文本文件（因为字符串很好且便携）。但是，由于将为此编写的数据量，我正在寻找节省空间的方法。但是，我试图找到一种方法，在另一方面解释这一点时不会丢失任何数据。

我最初的想法是继续创建一个如下所示的结构：

struct dataStruct{
  char front;
  int a;
  double b, c, d, e, f, g, h, i, j, l, m, n, o;
  char end;
}

然后按如下方式进行联合：

union dataUnion{
  dataStruct d;
  char[110] c;
}
//110 was chosen because an int = 4 char, and a double = 8 char,
//so 13*8 = 104, and therefore d = 1 + 4 + 13*8 + 1 = 110

然后将char数组写入文件。然而，一点点阅读告诉我，这样的实现可能不一定兼容操作系统（更糟糕的是......它可能在某些时间工作而不是其他时间......）。

所以我想知道 - 是否有一种可移植的方式来保存这些数据而不仅仅将其保存为原始文本？

Answer 1

正如其他人所说：序列化可能是解决问题的最佳方案。

由于您处于资源有限的环境中，我建议您使用MsgPack之类的内容。它的标题只是（给定一个C ++ 11编译器），相当轻，格式简单，C ++接口很好。它甚至允许您非常容易地序列化用户定义的类型（即类/结构）：

// adapted from https://github.com/msgpack/msgpack-c/blob/master/QUICKSTART-CPP.md

#include <msgpack.hpp>
#include <vector>
#include <string>

struct dataStruct {
    int a;
    double b, c, d, e, f, g, h, i, j, l, m, n, oo;  // yes "oo", because "o" clashes with msgpack :/

    MSGPACK_DEFINE(a, b, c, d, e, f, g, h, i, j, l, m, n, oo);
};

int main(void) {
    std::vector<dataStruct> vec;
    // add some elements into vec...

    // you can serialize dataStruct directly
    msgpack::sbuffer sbuf;
    msgpack::pack(sbuf, vec);

    msgpack::unpacked msg;
    msgpack::unpack(&msg, sbuf.data(), sbuf.size());

    msgpack::object obj = msg.get();

    // you can convert object to dataStruct directly
    std::vector<dataStruct> rvec;
    obj.convert(&rvec);
}

作为替代方案，您可以查看Google的FlatBuffers。它看起来资源效率很高，但我还没有尝试过。

编辑：这是一个完整的示例，说明了整个序列化 - 文件I / O - 反序列化循环：

// adapted from:
// https://github.com/msgpack/msgpack-c/blob/master/QUICKSTART-CPP.md
// https://github.com/msgpack/msgpack-c/wiki/v1_1_cpp_unpacker#msgpack-controls-a-buffer

#include <msgpack.hpp>
#include <fstream>
#include <iostream>

using std::cout;
using std::endl;

struct dataStruct {
    int a;
    double b, c, d, e, f, g, h, i, j, l, m, n, oo;  // yes "oo", because "o" clashes with msgpack :/

    MSGPACK_DEFINE(a, b, c, d, e, f, g, h, i, j, l, m, n, oo);
};

std::ostream& operator<<(std::ostream& out, const dataStruct& ds)
{
    out << "[a:" << ds.a << " b:" << ds.b << " ... oo:" << ds.oo << "]";
    return out;
}

int main(void) {

    // serialize
    {
        // prepare the (buffered) output file
        std::ofstream ofs("log.bin");

        // prepare a data structure
        dataStruct ds;

        // fill in sample data
        ds.a  = 1;
        ds.b  = 1.11;
        ds.oo = 101;
        msgpack::pack(ofs, ds);
        cout << "serialized: " << ds << endl;

        ds.a  = 2;
        ds.b  = 2.22;
        ds.oo = 202;
        msgpack::pack(ofs, ds);
        cout << "serialized: " << ds << endl;

        // continuously receiving data
        //while ( /* data is being received... */ ) {
        //
        //    // initialize ds...
        //
        //    // serialize ds
        //    // You can use any classes that have the following member function:
        //    // https://github.com/msgpack/msgpack-c/wiki/v1_1_cpp_packer#buffer
        //    msgpack::pack(ofs, ds);
        //}
    }

    // deserialize
    {
        // The size may decided by receive performance, transmit layer's protocol and so on.

        // prepare the input file
        std::ifstream ifs("log.bin");
        std::streambuf* pbuf = ifs.rdbuf();

        const std::size_t try_read_size = 100;  // arbitrary number...
        msgpack::unpacker unp;
        dataStruct ds;

        // read data while there are still unprocessed bytes...
        while (pbuf->in_avail() > 0) {
            unp.reserve_buffer(try_read_size);
            // unp has at least try_read_size buffer on this point.

            // input is a kind of I/O library object.
            // read message to msgpack::unpacker's internal buffer directly.
            std::size_t actual_read_size = ifs.readsome(unp.buffer(), try_read_size);

            // tell msgpack::unpacker actual consumed size.
            unp.buffer_consumed(actual_read_size);

            msgpack::unpacked result;
            // Message pack data loop
            while(unp.next(result)) {
                msgpack::object obj(result.get());
                obj.convert(&ds);

                // use ds
                cout << "deserialized: " << ds << endl;
            }
            // All complete msgpack message is proccessed at this point,
            // then continue to read addtional message.
        }
    }
}

输出：

serialized: [a:1 b:1.11 ... oo:101]
serialized: [a:2 b:2.22 ... oo:202]
deserialized: [a:1 b:1.11 ... oo:101]
deserialized: [a:2 b:2.22 ... oo:202]

Answer 2

您需要序列化数据。而且因为我可以认为提升不是一种选择，你必须手工完成。

真正的可移植性（期望未签名）令人头疼。但是，如果您知道您使用的所有系统都使用相同的编码（例如，对于有符号整数的两个补码和用于浮点的IEE754），那么您很幸运，您可以使用基本位操作来执行此操作。 / p>

您需要使用掩码逐字节设置缓冲区。

您唯一需要做的就是取决于计算机的字节字节顺序。

Answer 3

不要重新发明轮子。这就是Google Protocol Buffers旨在解决的问题 - 以不需要人类可读的方式在计算机之间传输定义良好的数据。（EG而不是JSON或XML）

或者你可以去真正的旧skool并阅读ASN.1

为了完整性，这里是Comparison of data serialization formats，所以请随意挑选你的毒药。

Answer 4

最好的方法是使用序列化（ProtoBuf，Thrift等）。但是如果你不能使用它并且需要'原始'解决方案，唯一的方法是使用在所有平台上具有相同大小的特殊类型来描述你的结构：

struct dataStruct{
  uint32_t a;  // see cstdint.h or boost
  /// ...
}

你也需要小心字节顺序。因此，无论何时序列化所有字段（将其传递到“其他”端或保存到文件），都必须始终将所有字段转换为little-endian（或big-endian）。

要记住的另一件事是结构打包））见

#pragma pack(1)

或

__attribute__((packed))

这是一个广泛的主题，因此最简单的解决方案是使用序列化器。

Answer 5

我会告诉你文本是最安全的。将它保存为原始int和双精度打开你的大/小端问题，并可能麻烦双布局格式。如果您不相互划分各种值，即使转到文本也可能会出现问题。

另一种方法是定义一个＆＃34;通用＆＃34;您自己的格式，并在您的写/读操作中转换为/从...输出int作为文本但作为伪科学符号文本说5个字符的尾数值，＆＃39; e＆＃39;和2/3字符指数。

试图找出便携式数据保存方法

5 个答案: