Question

我的代码是将大小超过1000万的向量写入文本文件。我使用clock（）来计算writefile函数及其程序中最慢的部分。写入文件比下面的方法更好吗？

void writefile(vector<fields>& fieldsvec, ofstream& sigfile, ofstream& noisefile)
/* Writes clean and noise data to respective files
 *
 * fieldsvec: vector of clean data
 * noisevec: vector of noise data
 * sigfile: file to store clean data
 * noisefile: file to store noise data
 */
{
    for(unsigned int i=0; i<fieldsvec.size(); i++)
    {
        if(fieldsvec[i].nflag==false)
        {
            sigfile << fieldsvec[i].timestamp << ";" << fieldsvec[i].price << ";" << fieldsvec[i].units;
            sigfile << endl;
        }
        else
        {
            noisefile << fieldsvec[i].timestamp << ";" << fieldsvec[i].price << ";" << fieldsvec[i].units;
            noisefile << endl;
        }
    }
}

我的结构是：

struct fields
// Stores a parsed line of a file
{
public:
    string timestamp;
    float price;
    float units;
    bool nflag; //flag if noise (TRUE=NOISE)
};

Answer 1

我建议摆脱endl。这样每次都可以有效地刷新缓冲区，从而大大增加了系统调用次数。

写'\n'代替endl应该是一个非常好的改进。

顺便说一句，代码可以简化：

ofstream& files[2] = { sigfile, noisefile };
for(unsigned int i=0; i<fieldsvec.size(); i++)
  files[fieldsvec[i].nflag] << fieldsvec[i].timestamp << ';' << fieldsvec[i].price << ";\n";

Answer 2

您可以按照二进制格式而不是文本格式编写文件，以提高写入速度，如the first answer of this SO question中所述：

file.open(filename.c_str(), ios_base::binary);
...
// The following writes a vector into a file in binary format
vector<double> v;
const char* pointer = reinterpret_cast<const char*>(&v[0]);
size_t bytes = v.size() * sizeof(v[0]);
file.write(pointer, bytes);

从同一个链接，OP报告：

用\ n替换std :: endl将代码速度提高1％
将要写入流中的所有内容连接起来并在文件末尾写入所有内容，将代码速度提高了7％
将文本格式更改为二进制格式可将其代码速度提高90％。

Answer 3

一个重要的速度杀手就是您将数字转换为文本。

对于原始文件输出，默认情况下，ofstream上的缓冲应该非常有效。

您应该将数组作为const引用传递。这可能不是什么大问题，但它确实允许某些编译器优化。

如果您认为由于重复写入而导致流混乱，您可以尝试创建sprintf snprintf的字符串并将其写入一次。仅当您的时间戳是已知大小时才执行此操作。当然，这将进行额外的复制，因为必须将字符串放入输出缓冲区。实验

否则，它会开始变脏。当您需要调整文件的性能时，需要开始为应用程序定制缓冲区。这往往会导致不使用缓冲或缓存，扇区对齐自己的缓冲区，以及编写大块。

如何有效地将结构向量写入文件？

3 个答案: