Question

我想将下面的结构存储在磁盘中并能够再次读取它：（C ++）

struct pixels {
    std::vector<cv::Point> indexes;
    cv::Mat values;
};

我尝试使用ofstream和ifstream，但是他们需要变量的大小，在这种情况下我真的不知道如何计算。它不是一个带有int和double的简单结构。有没有办法在C ++中完成，最好不使用任何第三方库。

（我实际上来自Matlab语言。使用save save(filename, variables)）很容易用该语言完成。

修改
我刚试过Boost Serialization。不幸的是，我的使用速度很慢。

Answer 1

有各种各样的缺点和专业人士会想到几种方法。

使用OpenCV的XML/YAML persistence功能。
- XML格式（便携式）
- YAML格式（便携式）
- JSON格式（便携式）
使用Boost.Serialization
- 纯文本格式（便携式）
- XML格式（便携式）
- 二进制格式（非便携式）
原始数据到std::fstream
- 二进制格式（非便携式）

便携式＆＃34;我的意思是在任意平台+编译器上编写的数据文件可以在任何其他平台+编译器上读取。通过＆＃34;非便携式＆＃34;，我的意思是不一定是这种情况。 Endiannes很重要，编译器也可能有所作为。您可以以性能为代价为这种情况添加额外的处理。在这个答案中，我假设你在同一台机器上阅读和写作。

首先包括我们将使用的常见数据结构和实用功能：

#include <opencv2/opencv.hpp>

#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>

#include <boost/filesystem.hpp>

#include <boost/serialization/vector.hpp>

#include <chrono>
#include <fstream>
#include <vector>

// ============================================================================

using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::microseconds;

namespace ba = boost::archive;
namespace bs = boost::serialization;
namespace fs = boost::filesystem;

// ============================================================================

struct pixels
{
    std::vector<cv::Point> indexes;
    cv::Mat values;
};

struct test_results
{
    bool matches;
    double write_time_ms;
    double read_time_ms;
    size_t file_size;
};

// ----------------------------------------------------------------------------

bool validate(pixels const& pix_out, pixels const& pix_in)
{
    bool result(true);
    result &= (pix_out.indexes == pix_in.indexes);
    result &= (cv::countNonZero(pix_out.values != pix_in.values) == 0);
    return result;
}

pixels generate_data()
{
    pixels pix;
    for (int i(0); i < 10000; ++i) {
        pix.indexes.emplace_back(i, 2 * i);
    }
    pix.values = cv::Mat(1024, 1024, CV_8UC3);
    cv::randu(pix.values, 0, 256);

    return pix;
}

void dump_results(std::string const& label, test_results const& results)
{
    std::cout << label << "\n";
    std::cout << "Matched = " << (results.matches ? "true" : "false") << "\n";
    std::cout << "Write time = " << results.write_time_ms << " ms\n";
    std::cout << "Read time = " << results.read_time_ms << " ms\n";
    std::cout << "File size = " << results.file_size << " bytes\n";
    std::cout << "\n";
}

// ============================================================================

使用OpenCV FileStorage

这是第一个明显的选择是使用OpenCV提供的序列化功能 - cv::FileStorage，cv::FileNode和cv::FileNodeIterator。 2.4.x文档中有nice tutorial，我现在似乎无法在新文档中找到它。

这样做的好处是我们已经支持cv::Mat和cv::Point，因此实施起来很少。

但是，所提供的所有格式都是文本的，因此读取和写入值的成本相当高（特别是对于cv::Mat）。使用cv::Mat / cv::imread保存/加载cv::imwrite并序列化文件名可能更有利。我将此留给读者实施和基准测试。

// ============================================================================

void save_pixels(pixels const& pix, cv::FileStorage& fs)
{
    fs << "indexes" << "[";
    for (auto const& index : pix.indexes) {
        fs << index;
    }
    fs << "]";
    fs << "values" << pix.values;
}

void load_pixels(pixels& pix, cv::FileStorage& fs)
{
    cv::FileNode n(fs["indexes"]);
    if (n.type() != cv::FileNode::SEQ) {
        throw std::runtime_error("Input format error: `indexes` is not a sequence.");;
    }

    pix.indexes.clear();
    cv::FileNodeIterator it(n.begin()), it_end(n.end());
    cv::Point pt;
    for (; it != it_end; ++it) {
        (*it) >> pt;
        pix.indexes.push_back(pt);
    }

    fs["values"] >> pix.values;
}

// ----------------------------------------------------------------------------

test_results test_cv_filestorage(std::string const& file_name, pixels const& pix)
{
    test_results results;
    pixels pix_in;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    {
        cv::FileStorage fs(file_name, cv::FileStorage::WRITE);

        save_pixels(pix, fs);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    {
        cv::FileStorage fs(file_name, cv::FileStorage::READ);

        load_pixels(pix_in, fs);
    }
    high_resolution_clock::time_point t3 = high_resolution_clock::now();

    results.matches = validate(pix, pix_in);
    results.write_time_ms = static_cast<double>(duration_cast<microseconds>(t2 - t1).count()) / 1000;
    results.read_time_ms = static_cast<double>(duration_cast<microseconds>(t3 - t2).count()) / 1000;
    results.file_size = fs::file_size(file_name);

    return results;
}

// ============================================================================

使用Boost序列化

另一种可能的方法是使用Boost.Serialization库，正如您所提到的那样。我们在归档格式上有三个选项，其中两个是文本（和便携式），一个是二进制（非便携式，但效率更高）。

这里还有更多工作要做。我们需要为cv::Mat，cv::Point和pixels结构提供良好的序列化。提供了对std::vector的支持，为了处理XML，我们需要生成键值对。

在两种文本格式的情况下，将cv::Mat保存为图像并且仅序列化路径可能再次有利。读者可以自由尝试这种方法。对于二进制格式，它很可能是空间和时间之间的权衡。再次，随意测试一下（您甚至可以使用cv::imencode和imdecode）。

// ============================================================================

namespace boost { namespace serialization {

template<class Archive>
void serialize(Archive &ar, cv::Mat& mat, const unsigned int)
{
    int cols, rows, type;
    bool continuous;

    if (Archive::is_saving::value) {
        cols = mat.cols; rows = mat.rows; type = mat.type();
        continuous = mat.isContinuous();
    }

    ar & boost::serialization::make_nvp("cols", cols);
    ar & boost::serialization::make_nvp("rows", rows);
    ar & boost::serialization::make_nvp("type", type);
    ar & boost::serialization::make_nvp("continuous", continuous);

    if (Archive::is_loading::value)
        mat.create(rows, cols, type);

    if (continuous) {
        size_t const data_size(rows * cols * mat.elemSize());
        ar & boost::serialization::make_array(mat.ptr(), data_size);
    } else {
        size_t const row_size(cols * mat.elemSize());
        for (int i = 0; i < rows; i++) {
            ar & boost::serialization::make_array(mat.ptr(i), row_size);
        }
    }
}

template<class Archive>
void serialize(Archive &ar, cv::Point& pt, const unsigned int)
{
    ar & boost::serialization::make_nvp("x", pt.x);
    ar & boost::serialization::make_nvp("y", pt.y);
}

template<class Archive>
void serialize(Archive &ar, ::pixels& pix, const unsigned int)
{
    ar & boost::serialization::make_nvp("indexes", pix.indexes);
    ar & boost::serialization::make_nvp("values", pix.values);
}

}}

// ----------------------------------------------------------------------------

template <typename OArchive, typename IArchive>
test_results test_bs_filestorage(std::string const& file_name
    , pixels const& pix
    , bool binary = false)
{
    test_results results;
    pixels pix_in;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    {
        std::ios::openmode mode(std::ios::out);
        if (binary) mode |= std::ios::binary;
        std::ofstream ofs(file_name.c_str(), mode);
        OArchive oa(ofs);

        oa & boost::serialization::make_nvp("pixels", pix);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    {
        std::ios::openmode mode(std::ios::in);
        if (binary) mode |= std::ios::binary;
        std::ifstream ifs(file_name.c_str(), mode);
        IArchive ia(ifs);

        ia & boost::serialization::make_nvp("pixels", pix_in);
    }
    high_resolution_clock::time_point t3 = high_resolution_clock::now();

    results.matches = validate(pix, pix_in);
    results.write_time_ms = static_cast<double>(duration_cast<microseconds>(t2 - t1).count()) / 1000;
    results.read_time_ms = static_cast<double>(duration_cast<microseconds>(t3 - t2).count()) / 1000;
    results.file_size = fs::file_size(file_name);

    return results;
}

// ============================================================================

原始数据到`std::fstream`

如果我们不关心数据文件的可移植性，我们可以做最少量的工作来转储和恢复内存。通过一些努力（以速度为代价），您可以使其更加灵活。

// ============================================================================

void save_pixels(pixels const& pix, std::ofstream& ofs)
{
    size_t index_count(pix.indexes.size());
    ofs.write(reinterpret_cast<char const*>(&index_count), sizeof(index_count));
    ofs.write(reinterpret_cast<char const*>(&pix.indexes[0]), sizeof(cv::Point) * index_count);

    int cols(pix.values.cols), rows(pix.values.rows), type(pix.values.type());
    bool continuous(pix.values.isContinuous());

    ofs.write(reinterpret_cast<char const*>(&cols), sizeof(cols));
    ofs.write(reinterpret_cast<char const*>(&rows), sizeof(rows));
    ofs.write(reinterpret_cast<char const*>(&type), sizeof(type));
    ofs.write(reinterpret_cast<char const*>(&continuous), sizeof(continuous));

    if (continuous) {
        size_t const data_size(rows * cols * pix.values.elemSize());
        ofs.write(reinterpret_cast<char const*>(pix.values.ptr()), data_size);
    } else {
        size_t const row_size(cols * pix.values.elemSize());
        for (int i(0); i < rows; ++i) {
            ofs.write(reinterpret_cast<char const*>(pix.values.ptr(i)), row_size);
        }
    }
}

void load_pixels(pixels& pix, std::ifstream& ifs)
{
    size_t index_count(0);
    ifs.read(reinterpret_cast<char*>(&index_count), sizeof(index_count));
    pix.indexes.resize(index_count);
    ifs.read(reinterpret_cast<char*>(&pix.indexes[0]), sizeof(cv::Point) * index_count);

    int cols, rows, type;
    bool continuous;

    ifs.read(reinterpret_cast<char*>(&cols), sizeof(cols));
    ifs.read(reinterpret_cast<char*>(&rows), sizeof(rows));
    ifs.read(reinterpret_cast<char*>(&type), sizeof(type));
    ifs.read(reinterpret_cast<char*>(&continuous), sizeof(continuous));

    pix.values.create(rows, cols, type);

    if (continuous) {
        size_t const data_size(rows * cols * pix.values.elemSize());
        ifs.read(reinterpret_cast<char*>(pix.values.ptr()), data_size);
    } else {
        size_t const row_size(cols * pix.values.elemSize());
        for (int i(0); i < rows; ++i) {
            ifs.read(reinterpret_cast<char*>(pix.values.ptr(i)), row_size);
        }
    }
}

// ----------------------------------------------------------------------------

test_results test_raw(std::string const& file_name, pixels const& pix)
{
    test_results results;
    pixels pix_in;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    {
        std::ofstream ofs(file_name.c_str(), std::ios::out | std::ios::binary);

        save_pixels(pix, ofs);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    {
        std::ifstream ifs(file_name.c_str(), std::ios::in | std::ios::binary);

        load_pixels(pix_in, ifs);
    }
    high_resolution_clock::time_point t3 = high_resolution_clock::now();

    results.matches = validate(pix, pix_in);
    results.write_time_ms = static_cast<double>(duration_cast<microseconds>(t2 - t1).count()) / 1000;
    results.read_time_ms = static_cast<double>(duration_cast<microseconds>(t3 - t2).count()) / 1000;
    results.file_size = fs::file_size(file_name);

    return results;
}

// ============================================================================

完成`main()`

让我们针对各种方法运行所有测试并比较结果。

<强>代码：

// ============================================================================

int main()
{
    namespace ba = boost::archive;

    pixels pix(generate_data());

    auto r_c_xml = test_cv_filestorage("test.cv.xml", pix);
    auto r_c_yaml = test_cv_filestorage("test.cv.yaml", pix);
    auto r_c_json = test_cv_filestorage("test.cv.json", pix);

    auto r_b_txt = test_bs_filestorage<ba::text_oarchive, ba::text_iarchive>("test.bs.txt", pix);
    auto r_b_xml = test_bs_filestorage<ba::xml_oarchive, ba::xml_iarchive>("test.bs.xml", pix);
    auto r_b_bin = test_bs_filestorage<ba::binary_oarchive, ba::binary_iarchive>("test.bs.bin", pix, true);

    auto r_b_raw = test_raw("test.raw", pix);

    // ----

    dump_results("OpenCV - XML", r_c_xml);
    dump_results("OpenCV - YAML", r_c_yaml);
    dump_results("OpenCV - JSON", r_c_json);
    dump_results("Boost - TXT", r_b_txt);
    dump_results("Boost - XML", r_b_xml);
    dump_results("Boost - Binary", r_b_bin);
    dump_results("Raw", r_b_raw);

    return 0;
}

// ============================================================================

控制台输出（i7-4930k，Win10，MSVC 2013）

注意：我们正在使用10000 indexes和values作为1024x1024 BGR图像对此进行测试。

OpenCV - XML
Matched = true
Write time = 257.563 ms
Read time = 257.016 ms
File size = 12323677 bytes

OpenCV - YAML
Matched = true
Write time = 135.498 ms
Read time = 311.999 ms
File size = 16353873 bytes

OpenCV - JSON
Matched = true
Write time = 137.003 ms
Read time = 312.528 ms
File size = 16353873 bytes

Boost - TXT
Matched = true
Write time = 1293.84 ms
Read time = 1210.94 ms
File size = 11333696 bytes

Boost - XML
Matched = true
Write time = 4890.82 ms
Read time = 4042.75 ms
File size = 62095856 bytes

Boost - Binary
Matched = true
Write time = 12.498 ms
Read time = 4 ms
File size = 3225813 bytes

Raw
Matched = true
Write time = 8.503 ms
Read time = 2.999 ms
File size = 3225749 bytes

结论

观察结果，文本Boost.Serialization格式非常慢 - 我明白你的意思。单独保存values肯定会带来显着的好处。如果可移植性不是问题，则二进制方法非常好。你仍然可以以合理的成本解决这个问题。

OpenCV执行得更好，XML在读写时保持平衡，YAML / JSON（显然相同）在写入时更快，但在读取时更慢。仍然相当缓慢，所以将values写为图像并保存文件名可能仍然有益。

原始方法是最快的（毫不奇怪），但也不灵活。当然，你可以做一些改进，但它似乎需要比使用二进制Boost.Archive更多的代码 - 在这里不值得。尽管如此，如果你在同一台机器上做所有事情，这可能会起到作用。

就个人而言，我会选择二进制Boost方法，如果您需要跨平台功能，请进行调整。

将包含vector和cv :: Mat的struct存储到磁盘 - C ++中的数据序列化

1 个答案:

使用OpenCV FileStorage

使用Boost序列化

原始数据到`std::fstream`

完成`main()`

结论

将包含vector和cv :: Mat的struct存储到磁盘 - C ++中的数据序列化

1 个答案:

使用OpenCV FileStorage

使用Boost序列化

原始数据到std::fstream

完成main()

结论

原始数据到`std::fstream`

完成`main()`