Question

通常我只会使用C样式文件IO，但是我正在尝试一种现代的C ++方法，包括使用C ++ 17特定功能std::byte和std::filesystem。

将整个文件读入内存，传统方法：

#include <stdio.h>
#include <stdlib.h>

char *readFileData(char *path)
{
    FILE *f;
    struct stat fs;
    char *buf;

    stat(path, &fs);
    buf = (char *)malloc(fs.st_size);

    f = fopen(path, "rb");
    fread(buf, fs.st_size, 1, f);
    fclose(f);

    return buf;
}

将整个文件读入内存，采用现代方法：

#include <filesystem>
#include <fstream>
#include <string>
using namespace std;
using namespace std::filesystem;

auto readFileData(string path)
{
    auto fileSize = file_size(path);
    auto buf = make_unique<byte[]>(fileSize);
    basic_ifstream<byte> ifs(path, ios::binary);
    ifs.read(buf.get(), fileSize);
    return buf;
}

这看起来正确吗？这可以改善吗？

Answer 1

我个人更喜欢std::vector<std::byte>使用std::string，除非您正在阅读实际的文本文档。 make_unique<byte[]>(fileSize);的问题在于，您会立即丢失数据的大小，而必须将其携带在单独的变量中。鉴于它不会零初始化，因此它可能比std::vector<std::byte>快一小部分。但是我认为，读取磁盘所花费的时间可能总是被它所掩盖。

所以对于二进制文件，我使用类似这样的东西：

std::vector<std::byte> load_file(std::string const& filepath)
{
    std::ifstream ifs(filepath, std::ios::binary|std::ios::ate);

    if(!ifs)
        throw std::runtime_error(filepath + ": " + std::strerror(errno));

    auto end = ifs.tellg();
    ifs.seekg(0, std::ios::beg);

    auto size = std::size_t(end - ifs.tellg());

    if(size == 0) // avoid undefined behavior 
        return {}; 

    std::vector<std::byte> buffer(size);

    if(!ifs.read((char*)buffer.data(), buffer.size()))
        throw std::runtime_error(filepath + ": " + std::strerror(errno));

    return buffer;
}

这是我所知道的最快的方法。它还避免了在确定文件中数据大小时出现的常见错误，因为ifs.tellg()不一定与最后打开文件后的文件大小相同，并且ifs.seekg(0)从理论上讲不是正确的方法来定位文件的开头（即使实际上在大多数地方都可以使用）。

保证来自std::strerror(errno)的错误消息可以在POSIX系统上工作（应该包括Microsoft，但不能确定）。

很显然，您可以根据需要使用std::filesystem::path const& filepath代替std::string。

此外，尤其是对于C++17之前的版本，如果您没有或不想使用std::vector<unsigned char>，则可以使用std::vector<char>或std::byte。

读取二进制文件的惯用C ++ 17标准方法是什么？

1 个答案: