Question

我正在尝试在C ++中加入两个大文件（例如UNIX cat命令：cat file1 file2＆gt; final）。

我不知道怎么做，因为我尝试的每个方法都很慢（例如，将第二个文件逐行复制到第一个文件中）

¿做到这一点的最佳方法是什么？

很抱歉这么简短，我的英语不太好

Answer 1

在标准流中使用二进制模式来完成工作，不要将其作为格式化数据处理。

如果您想以块的形式传输数据，这是一个演示：

#include <fstream>
#include <vector>

std::size_t fileSize(std::ifstream& file)
{
    std::size_t size;

    file.seekg(0, std::ios::end);
    size = file.tellg();
    file.seekg(0, std::ios::beg);

    return size;
}

int main()
{
    // 1MB! choose a conveinent buffer size.
    const std::size_t blockSize = 1024 * 1024;

    std::vector<char> data(blockSize);
    std::ifstream first("first.txt", std::ios::binary),
                second("second.txt", std::ios::binary);
    std::ofstream result("result.txt", std::ios::binary);
    std::size_t firstSize  = fileSize(first);
    std::size_t secondSize = fileSize(second);

    for(std::size_t block = 0; block < firstSize/blockSize; block++)
    {
        first.read(&data[0], blockSize);
        result.write(&data[0], blockSize);
    }

    std::size_t firstFilerestOfData = firstSize%blockSize;

    if(firstFilerestOfData != 0)
    {
        first.read(&data[0], firstFilerestOfData);
        result.write(&data[0], firstFilerestOfData);
    }

    for(std::size_t block = 0; block < secondSize/blockSize; block++)
    {
        second.read(&data[0], blockSize);
        result.write(&data[0], blockSize);
    }

    std::size_t secondFilerestOfData = secondSize%blockSize;

    if(secondFilerestOfData != 0)
    {
        second.read(&data[0], secondFilerestOfData);
        result.write(&data[0], secondFilerestOfData);
    }

    first.close();
    second.close();
    result.close();

    return 0;
}

Answer 2

如果您使用std::fstream，请不要。它主要用于格式化输入/输出，并且它的char级操作是slower than you'd expect。而是直接使用std::filebuf。这是对其他答案中的建议的补充，特别是使用较大的缓冲区大小。

Answer 3

使用普通的旧C ++：

#include <fstream>

std::ifstream file1("x", ios_base::in | ios_base::binary);
std::ofstream file2("y", ios_base::app | ios_base::binary);
file2 << file1.rdbuf();

Boost标头声称copy()在某些情况下已经过优化，但我不确定这是否重要：

#include <boost/iostreams/copy.hpp>
// The following four overloads of copy_impl() optimize 
// copying in the case that one or both of the two devices
// models Direct (see 
// http://www.boost.org/libs/iostreams/doc/index.html?path=4.1.1.4)

boost::iostreams::copy(file1, file2);

更新

Boost复制功能兼容多种类型，因此可以与Pavel Minaev建议使用std::filebuf一样：

std::filebuf file1, file2;

file1.open("x", ios_base::in | ios_base::binary);
file2.open("y", ios_base::app | ios_base::binary);

file1.setbuf(NULL, 64 * 1024);
file2.setbuf(NULL, 64 * 1024);

boost::iostreams::copy(file1, file2);

当然，实际的最佳缓冲区大小取决于许多变量，64k只是一个疯狂的猜测。

Answer 4

作为替代方案，可能会或可能不会更快，具体取决于您在机器上的文件大小和内存。如果内存紧张，你可以缩小缓冲区大小并循环遍历f2.read以块的形式获取数据并写入f1。

#include <fstream>
#include <iostream>

using namespace std;

int main(int argc, char *argv[])
{
        ofstream f1("test.txt", ios_base::app | ios_base::binary);
        ifstream f2("test2.txt");

        f2.seekg(0,ifstream::end);
        unsigned long size = f2.tellg();
        f2.seekg(0);

        char *contents = new char[size];
        f2.read(contents, size);
        f1.write(contents, size);

        delete[] contents;
        f1.close();
        f2.close();

        return 1;
}

合并大文件

4 个答案: