C ++从文件的一部分加载数据

时间:2013-04-14 20:26:36

标签: c++ serialization file-io binaryfiles boost-serialization

我想在一个文件中保留一堆简单的结构(目前每个结构只有3个整数),并且能够在任何给定时间只读回其中一个结构。

作为第一步,我正在尝试将它们输出到文件,然后使用boost :: serialization将它们读回来。目前我正在这样做,崩溃:

std::array<Patch, 3> outPatches;

outPatches[0].ZOrigin = 0;
outPatches[0].XOrigin = 0;
outPatches[0].Resolution = 64;

outPatches[1].ZOrigin = 1;
outPatches[1].XOrigin = 5;
outPatches[1].Resolution = 3;

outPatches[2].ZOrigin = 123;
outPatches[2].XOrigin = 546;
outPatches[2].Resolution = 6;

std::ofstream ofs("testing.sss", std::ios::binary);

for (auto const& patch : outPatches)
{
    std::cout << "start archive: " << ofs.tellp() << std::endl;
    {
    boost::archive::binary_oarchive oa(ofs);
    std::cout << "start patch: " << ofs.tellp() << std::endl;

    oa << patch;
    }
}

ofs.close();


std::array<Patch, 3> inPatches;

std::ifstream ifs("testing.sss", std::ios::binary);

for (auto& patch : inPatches)
{
    std::cout << "start archive: " << ifs.tellg() << std::endl;
    {
    boost::archive::binary_iarchive ia(ifs); // <-- crash here on second patch

    std::cout << "start patch: " << ifs.tellg() << std::endl;

    ia >> patch;
    }
}

ifs.close();

for (int i = 0; i != 3; ++i)
    std::cout << "check: " << (inPatches[i] == outPatches[i]) << std::endl;

我计划使用tell来制作每个结构所在位置的索引,并设法在加载时跳转到该结构。这是一种合理的方法吗?我对基础知识之外的流程了解不多。

我已经尝试将所有补丁放在一个o / iarchive中,这适用于按顺序读取所有内容。然而,在流上寻找不起作用。

我发现了这个,这可能是我想要的,但我不知道它在做什么或如何使用它,或者它是否适用于boost :: serialization:read part of a file with iostreams

如果有必要,我可能愿意切换到另一种序列化方法,因为我对此并不是很了解。

修改3 :将修改1和2移至答案。

2 个答案:

答案 0 :(得分:1)

我曾经遇到类似的情况(使用提升/序列化)。我当时所做的(如果我记得的那样效率非常高)是将文件映射到虚拟地址,编写一个在内存缓冲区而不是文件上运行的流转化器,并为我想要读取的每个部分分配适当的偏移量将流式传输器作为缓冲区的起始/长度,并使用流式传输器初始化iarchive,以便序列化库将其视为每个对象都在一个单独的文件中。

当然,添加到文件需要重新映射。现在我回顾这一点,看起来有点奇怪,但它效率很高,而且还不错。

答案 1 :(得分:1)

提升序列化

似乎不可能在boost序列化存档中跳过。到目前为止,我得到的最好的是在一个流上使用多个档案:

static const int numPatches = 5000;

std::vector<int> indices(numPatches, 0);
std::iota(indices.begin(), indices.end(), 0);

std::vector<Patch> outPatches(numPatches, Patch());

std::for_each(outPatches.begin(), outPatches.end(), 
    [] (Patch& p)
    {
        p.ZOrigin = rand();
        p.XOrigin = rand();
        p.Resolution = rand();
    });


std::vector<int64_t> offsets(numPatches, 0);

std::ofstream ofs("testing.sss", std::ios::binary);

for (auto i : indices)
{
    offsets[i] = ofs.tellp();

    boost::archive::binary_oarchive oa(ofs, 
        boost::archive::no_header | boost::archive::no_tracking);
    oa << outPatches[i];
}

ofs.close();


std::random_shuffle(indices.begin(), indices.end());


std::vector<Patch> inPatches(numPatches, Patch());

std::ifstream ifs("testing.sss", std::ios::binary);

for (auto i : indices)
{
    ifs.seekg(offsets[i]);

    boost::archive::binary_iarchive ia(ifs,
        boost::archive::no_header | boost::archive::no_tracking);
    ia >> inPatches[i];

    ifs.clear();
}

std::cout << std::all_of(indices.begin(), indices.end(), 
    [&] (int i) { return inPatches[i] == outPatches[i]; }) << std::endl;

不幸的是,这很慢,所以我认为我不能使用它。接下来是测试protobuf。


谷歌:: protobuf的

我有一些与protobuf合作的东西。它需要一些摆弄(显然我必须使用LimitingInputStream类型,并存储每个对象的大小),但它比boost :: serialization版本快得多:

static const int numPatches = 500;

std::vector<int> indices(numPatches, 0);
std::iota(indices.begin(), indices.end(), 0);

std::vector<Patch> outPatches(numPatches, Patch());

std::for_each(outPatches.begin(), outPatches.end(), 
    [] (Patch& p)
    {
        p.ZOrigin = rand();
        p.XOrigin = rand();
        p.Resolution = 64;
    });


std::vector<int64_t> streamOffset(numPatches, 0);
std::vector<int64_t> streamSize(numPatches, 0);

std::ofstream ofs("testing.sss", std::ios::binary);

PatchBuffer buffer;

for (auto i : indices)
{
    buffer.Clear();

    WriteToPatchBuffer(buffer, outPatches[i]);

    streamOffset[i] = ofs.tellp();
    streamSize[i] = buffer.ByteSize();

    buffer.SerializeToOstream(&ofs);
}

ofs.close();

std::random_shuffle(indices.begin(), indices.end());

std::vector<Patch> inPatches(numPatches, Patch());

std::ifstream ifs("testing.sss", std::ios::binary);

for (auto i : indices)
{
    ifs.seekg(streamOffset[i]);

    buffer.Clear();

    google::protobuf::io::IstreamInputStream iis(&ifs);
    google::protobuf::io::LimitingInputStream lis(&iis, streamSize[i]);
    buffer.ParseFromZeroCopyStream(&lis);

    ReadFromPatchBuffer(inPatches[i], buffer);

    ifs.clear();
}

std::cout << std::all_of(indices.begin(), indices.end(), 
    [&] (int i) { return inPatches[i] == outPatches[i]; }) << std::endl;