Question

我有两个std :: ofstream文本文件，每个文件有100多个megs，我想连接它们。使用fstreams来存储数据以创建单个文件通常会因为内存不足而导致错误，因为它的大小太大了。

有没有办法比O（n）更快地合并它们？

文件1（160MB）：

0 1 3 5
7 9 11 13
...
...
9187653 9187655 9187657 9187659

文件2（120MB）：

a b c d e f g h i j
a b c d e f g h j i
a b c d e f g i h j
a b c d e f g i j h
...
...
j i h g f e d c b a

合并（380MB）：

0 1 3 5
7 9 11 13
...
...
9187653 9187655 9187657 9187659 
a b c d e f g h i j
a b c d e f g h j i
a b c d e f g i h j
a b c d e f g i j h
...
...
j i h g f e d c b a

文件生成：

std::ofstream a_file ( "file1.txt" );
std::ofstream b_file ( "file2.txt" );

    while(//whatever){
          a_file << num << endl;
    }

    while(//whatever){
          b_file << character << endl;
    }

    // merge them here, doesn't matter if output is one of them or a new file
    a_file.close();
    b_file.close();

Answer 1

假设您不想进行任何处理，并且只想将两个文件连接成第三个，您可以通过流式传输文件的缓冲区来实现这一点：

std::ifstream if_a("a.txt", std::ios_base::binary);
std::ifstream if_b("b.txt", std::ios_base::binary);
std::ofstream of_c("c.txt", std::ios_base::binary);

of_c << if_a.rdbuf() << if_b.rdbuf();

我曾尝试使用高达100Mb的文件进行此类操作并且没有任何问题。您有效地让C ++和库处理所需的任何缓冲。这也意味着如果您的文件非常，您无需担心文件位置。

另一种方法是，如果您只想将b.txt复制到a.txt的末尾，在这种情况下，您需要使用追加标记打开a.txt，并寻求结束：

std::ofstream of_a("a.txt", std::ios_base::binary | std::ios_base::app);
std::ifstream if_b("b.txt", std::ios_base::binary);

of_a.seekp(0, std::ios_base::end);
of_a << if_b.rdbuf();

这些方法的工作原理是将输入流的std::streambuf传递给输出流的operator<<，其中一个覆盖采用streambuf参数（{{3 }}）。如该链接中所述，在没有错误的情况下，streambuf未格式化地插入到输出流中，直到文件结束。

Answer 2

有没有办法比O（n）更快地合并它们？

这意味着您将处理数据而不会通过它甚至一次。如果不至少读一次就不能解释合并（简答：否）。

为了读取数据，你应该考虑非缓冲读取（看看std :: fstream :: read）。

Answer 3

在Windows上： -

system ("copy File1+File2 OutputFile");

Linux上的

： -

system ("cat File1 File2 > OutputFile");

但答案很简单 - 不要将整个文件读入内存！以小块读取输入文件： -

void Cat (input_file, output_file)
{
  while ((bytes_read = read_data (input_file, buffer, buffer_size)) != 0)
  { 
    write_data (output_file, buffer, bytes_read);
  }
}

int main ()
{
   output_file = open output file

   input_file = open input file1
   Cat (input_file, output_file)
   close input_file

   input_file = open input file2
   Cat (input_file, output_file)
   close input_file
}

Answer 4

这取决于你是否希望使用＆＃34; pure＆＃34; C ++为此，个人以便携性为代价，我很想写：

#include <cstdlib>
#include <sstream>

int main(int argc, char* argv[]) {
    std::ostringstream command;

    command << "cat "; // Linux Only, command for Windows is slightly different

    for (int i = 2; i < argc; ++i) { command << argv[i] << " "; }

    command << "> ";

    command << argv[1];

    return system(command.str().c_str());
}

这是好的C ++代码吗？不，不是真的（不可移植，也不会逃避命令参数）。

但是它会让你领先于现在所处的位置。

至于＆＃34;真实＆＃34; C ++解决方案，具有流可以管理的所有丑陋......

#include <fstream>
#include <string>

static size_t const BufferSize = 8192; // 8 KB

void appendFile(std::string const& outFile, std::string const& inFile) {
    std::ofstream out(outFile, std::ios_base::app |
                               std::ios_base::binary |
                               std::ios_base::out);

    std::ifstream in(inFile, std::ios_base::binary |
                             std::ios_base::in);

    std::vector<char> buffer(BufferSize);
    while (in.read(&buffer[0], buffer.size())) {
        out.write(&buffer[0], buffer.size());
    }

    // Fails when "read" encounters EOF,
    // but potentially still writes *some* bytes to buffer!
    out.write(&buffer[0], in.gcount());
}

用C ++连接两个巨大的文件

4 个答案: