从流重定向更正无序打印

时间:2015-01-09 20:02:59

标签: python multiprocessing named-pipes io-redirection

我有一个使用multiprocessing.pool.map做一些工作的python脚本。当它发生时,它会将内容打印到stdout,以便将错误打印到stderr。我决定为每个流创建一个单独的日志文件会很好,经过一些思考后我应该像这样运行它:

time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)

这为我提供了日志文件,并在适当的流上保留了输出。然而,这就出现了问题。如果我在没有重定向的情况下运行它,我会得到这个:

$ time ./ecisSearch.py 58Ni.conf 4 1

2015-01-09 14:42:37.524333: This job will perform 4 fit(s)   //this is stdout

2015-01-09 14:42:37.524433: Threaded mapping of the fit function onto the starting point input set is commencing   //this is stdout

2015-01-09 14:42:37.526641: Starting run #: 0   //this is stdout
2015-01-09 14:42:37.527018: Starting run #: 1   //this is stdout
2015-01-09 14:42:37.529124: Starting run #: 2   //this is stdout
2015-01-09 14:42:37.529831: Starting run #: 3   //this is stdout
2015-01-09 14:42:54.052522: Test of std err writing in run 0 is finished   //this is stderr
2015-01-09 14:42:54.502284: Test of std err writing in run 1 is finished   //this is stderr
2015-01-09 14:42:59.952433: Test of std err writing in run 3 is finished   //this is stderr
2015-01-09 14:43:03.259783: Test of std err writing in run 2 is finished   //this is stderr

2015-01-09 14:43:03.260360: Finished fits in job #: 1 preparing to output data to file   //this is stdout

2015-01-09 14:43:03.275472: Job finished   //this is stdout


real    0m26.001s
user    0m44.145s
sys 0m32.626s

但是,使用重定向运行它会生成以下输出。

$ time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
2015-01-09 14:55:13.188230: Test of std err writing in run 0 is finished   //this is stderr
2015-01-09 14:55:13.855079: Test of std err writing in run 1 is finished   //this is stderr
2015-01-09 14:55:19.526580: Test of std err writing in run 3 is finished   //this is stderr
2015-01-09 14:55:23.628807: Test of std err writing in run 2 is finished   //this is stderr
2015-01-09 14:54:56.534790: Starting run #: 0   //this is stdout
2015-01-09 14:54:56.535162: Starting run #: 1   //this is stdout
2015-01-09 14:54:56.538952: Starting run #: 3   //this is stdout
2015-01-09 14:54:56.563677: Starting run #: 2   //this is stdout

2015-01-09 14:54:56.531837: This job will perform 4 fit(s)   //this is stdout

2015-01-09 14:54:56.531912: Threaded mapping of the fit function onto the starting point input set is commencing   //this is stdout


2015-01-09 14:55:23.629427: Finished fits in job #: 1 preparing to output data to file   //this is stdout

2015-01-09 14:55:23.629742: Job finished   //this is stdout


real    0m27.376s
user    0m44.661s
sys 0m33.295s

只要查看时间戳,我们就可以看到一些奇怪的事情发生在这里。 stderrstdout流不仅没有按照它们应该散布,而stdout组件似乎首先从子流程中获取内容,然后从' master'进程,无论它出现的顺序如何。我知道stderr是无缓冲的,stdout是缓冲的,但这并不能解释为什么stdout信息不在在其自己的流中排序。此外,从我的帖子中看不出来的是,所有stdout等待直到执行结束才出现在屏幕上。

我的问题如下:为什么会发生这种情况?而且,不太重要有没有办法解决这个问题?

1 个答案:

答案 0 :(得分:4)

缓冲输出到stdout:也就是说,print语句实际写入缓冲区,并且此缓冲区仅偶尔刷新到终端。每个进程都有一个单独的缓冲区,这就是为什么来自不同进程的写入可能无序出现的原因(这是一个常见问题,如Why subprocess stdout to a file is written out of order?

在这种情况下,输出是有序的,但在重定向时显示为乱序。为什么? This article解释说:

  
      
  • stdin总是被缓冲
  •   
  • stderr始终无缓冲
  •   
  • 如果stdout是终端,则缓冲自动设置为行缓冲,否则设置为缓冲
  •   

因此,当输出到达终端时,它正在冲洗每一行,并且碰巧按顺序出现。重定向时,使用长缓冲区(通常为4096字节)。由于您打印的数量少于此值,因此先完成的任何子流程都会先刷新。

解决方案是使用flush(),或完全禁用该过程的缓冲(请参阅Disable output buffering