Question

分析我的程序和功能打印需要花费大量时间来执行。如何将“原始”字节输出直接发送到stdout而不是使用fwrite，并使其更快（需要同时将print（）中的所有9字节发送到stdout）？

void print(){
    unsigned char temp[9];

    temp[0] = matrix[0][0];
    temp[1] = matrix[0][1];
    temp[2] = matrix[0][2];
    temp[3] = matrix[1][0];
    temp[4] = matrix[1][1];
    temp[5] = matrix[1][2];
    temp[6] = matrix[2][0];
    temp[7] = matrix[2][1];
    temp[8] = matrix[2][2];

    fwrite(temp,1,9,stdout);

}

矩阵全局定义为无符号字符矩阵[3] [3];

Answer 1

IO不是一种廉价的操作。事实上，这是一个阻塞操作，这意味着当您调用write以允许更多CPU绑定进程运行时，操作系统可以抢占您的进程，在您使用IO设备之前写完成操作。

您可以使用的唯一较低级别的功能（如果您在* nix机器上进行开发）是使用原始write功能，但即便如此，您的性能也不会比它快得多现在。简单地说：IO很贵。

Answer 2

评价最高的答案声称IO很慢。

这是一个快速基准测试，具有足够大的缓冲区，可以将操作系统从关键性能路径中移除，但只有在时才愿意以巨大的模糊方式接收输出。如果第一个字节的延迟是您的问题，则需要以“dribs”模式运行。

从9字节数组

写入1000万条记录

mint 12 AMD64 on 3GHz CoreDuo gcc 4.6.1

   340ms   to /dev/null 
   710ms   to 90MB output file 
 15254ms   to 90MB output file in "dribs" mode

FreeBSD 9 AMD64在2.4GHz CoreDuo下铿锵3.0

   450ms   to /dev/null 
   550ms   to 90MB output file on ZFS triple mirror
  1150ms   to 90MB output file on FFS system drive
 22154ms   to 90MB output file in "dribs" mode

如果你能负担得起适当的缓冲，IO没有什么慢的。

#include <stdio.h> 
#include <assert.h> 
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[]) 
{
    int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
    int err;
    int i; 
    enum { BigBuf = 4*1024*1024 };
    char* outbuf = malloc (BigBuf); 
    assert (outbuf != NULL); 
    err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering 
    assert (err == 0);

    enum { ArraySize = 9 };
    char temp[ArraySize]; 
    enum { Count = 10*1000*1000 }; 

    for (i = 0; i < Count; ++i) {
        fwrite (temp, 1, ArraySize, stdout);    
        if (dribs) fflush (stdout); 
    }
    fflush (stdout);  // seems to be needed after setting own buffer
    fclose (stdout);
    if (outbuf) { free (outbuf); outbuf = NULL; }
}

Answer 3

也许你的问题不是fwrite（）很慢，而是缓冲了。尝试在fwrite（）之后调用fflush（stdout）。

这完全取决于你在这种情况下对慢的定义。

Answer 4

您可以执行的最新输出形式是write系统调用，可能是这样的

write (1, matrix, 9);

1是标准输出的文件描述符（0是标准输入，2是标准错误）。你的标准输出只能写在另一端读取的标准输出（即终端或你输入的程序），这可能会很慢。

我不是百分百肯定，但您可以尝试在fd 1上设置非阻塞IO（使用fcntl）并希望操作系统为您缓冲它，直到它被另一端消耗为止。已经有一段时间了，但我认为它的工作原理是这样的

fcntl (1, F_SETFL, O_NONBLOCK);

但是，YMMV。如果我的语法错了，请纠正我，正如我所说，已经有一段时间了。

Answer 5

尽管iostream的打印速度非常慢，但所有打印都相当慢。

您最好的选择是使用printf，其中包括以下内容：

printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
  matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

Answer 6

你可以简单地说：

std::cout << temp;

printf 更像是C风格。

然而，IO操作成本很高，所以明智地使用它们。

Answer 7

正如大家都指出的那样，在紧密的内环中IO很昂贵。当需要调试时，我通常最终根据某些标准对Matrix进行条件cout。

如果你的应用是控制台应用，那么尝试将其重定向到一个文件，它将比控制台刷新快很多。例如app.exe＆gt; matrixDump.txt

Answer 8

出了什么问题：

fwrite(matrix,1,9,stdout);

一维和二维数组都占用相同的内存。

Answer 9

尝试两次运行程序。一次输出，一次没有。你会发现总的来说，没有io的那个是最快的。此外，您可以分叉进程（或创建一个线程），一个写入文件（stdout），另一个进行操作。

Answer 10

因此，首先不要在每个条目上打印。基本上我说的是不喜欢那样。

for(int i = 0; i<100; i++){
    printf("Your stuff");
}

相反，在堆栈或堆上分配一个缓冲区，然后在其中存储信息，然后将这个bufffer放入stdout中，就这样

char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
    char[i] = 1; //your 8 byte value goes here
}

//once you are done print it to a ocnsole with 
write(1, buffer, 100);

但对于您而言，只需使用write(1, temp, 9);

Answer 11

我很确定您可以通过增加缓冲区大小来提高输出性能。因此，您的fwrite调用较少。写可能会更快，但我不确定。只需尝试：

❯ yes | dd of=/dev/null count=1000000 
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s

vs

> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s

这同样适用于您的代码。最近几天的一些测试表明，良好的缓冲区大小可能约为1 << 12（= 4096）和1 << 16（= 65535）字节。

C / C ++向stdout发送多个字节的最佳方法

11 个答案:

从9字节数组

mint 12 AMD64 on 3GHz CoreDuo gcc 4.6.1

FreeBSD 9 AMD64在2.4GHz CoreDuo下铿锵3.0