Question

我使用tee log和xargs进程输出运行find命令;偶然我忘记在第二个管道中添加xargs并找到了这个问题。

示例：

% tree
.
├── a.sh
└── home
    └── localdir
        ├── abc_3
        ├── abc_6
        ├── mydir_1
        ├── mydir_2
        └── mydir_3

7 directories, 1 file

且a.sh的内容为：

% cat a.sh
#!/bin/bash
LOG="/tmp/abc.log"

find home/localdir -name "mydir*" -type d  -print | tee $LOG | echo

如果我使用某些命令添加第二个管道，例如echo或ls，写入日志操作偶尔会失败。

这些是我多次运行./a.sh时的一些示例：

% bash -x ./a.sh; cat /tmp/abc.log  // this tee failed
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo


% bash -x ./a.sh; cat /tmp/abc.log  // this tee ok
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo

home/localdir/mydir_2  // this is cat /tmp/abc.log output
home/localdir/mydir_3
home/localdir/mydir_1

为什么如果我使用某个命令添加第二个管道（并忘记xargs），tee命令偶尔会失败？

Answer 1

问题是，默认情况下，当写入管道失败时，tee退出。所以，请考虑：

find home/localdir -name "mydir*" -type d  -print | tee $LOG | echo

如果echo先完成，则管道将失败，tee将退出。但是，时机不精确。管道中的每个命令都在一个单独的子shell中。此外，还有变幻莫测的缓冲。因此，有时日志文件是在tee退出之前写入的，有时则不是。

为清楚起见，让我们考虑一个更简单的管道：

$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="0" [2]="0")'
1
2
3
4
5
6
7
8
9
10
$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="141" [2]="0")'
$

在第一次执行中，管道中的每个进程都以成功状态退出并写入日志文件。在第二次执行相同命令时，tee失败，退出代码为141，并且未写入日志文件。

我使用true代替echo来说明echo这里没有什么特别之处。任何可能拒绝输入的tee后面的命令都存在问题。

文档

tee的最新版本可以选择控制管道失败退出行为。来自coreutils-8.25的man tee：

- 输出误差[= MODE]
写入错误时设置行为。见下面的模式

MODE的可能性是：

MODE确定输出上有写入错误的行为：
   'warn' diagnose errors writing to any output

   'warn-nopipe'
          diagnose errors writing to any output not a pipe

   'exit' exit on error writing to any output

   'exit-nopipe'
          exit on error writing to any output not a pipe
-p选项的默认MODE是'warn-nopipe'。默认未指定--output-error时的操作是立即退出写入管道时出错，并诊断写入非管道的错误输出。

如您所见，默认行为“立即退出写入管道的错误“。因此，如果在tee写入日志文件之前写入tee之后的进程的尝试失败，那么tee将在没有写日志文件。

Answer 2

我调试了tee源代码，但我不熟悉Linux C，所以可能有问题。

tee属于src/tee.c

下的coreutils包

首先，它设置缓冲区：

setvbuf (stdout, NULL, _IONBF, 0); // for standard output
setvbuf (descriptors[i], NULL, _IONBF, 0);  // for file descriptor

所以这是不缓和的？

其次，tee将stdout作为描述符数组中的第一个项，并将使用for循环写入描述符：

/* In the array of NFILES + 1 descriptors, make
   the first one correspond to standard output.   */
descriptors[0] = stdout;
files[0] = _("standard output");
setvbuf (stdout, NULL, _IONBF, 0);

...

  for (i = 0; i <= nfiles; i++) {
    if (descriptors[i]
        && fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)  // failed!!!
      {
        error (0, errno, "%s", files[i]);
        descriptors[i] = NULL;
        ok = false;
      }
    }

例如tee a.log，descriptors [0]是stdout，而descriptors [1]是a.log。

正如@ John1024所说，管道是并行（之前我误解了）。第二个管道命令，例如echo，ls或true，不接受input，因此它不会“等待”输入，如果执行得更快，它会在 tee写入输出结束之前关闭管道（输入端），因此在代码上方，注释行将失败并不会写入文件描述符。

供应：

strace结果killed by SIGPIPE：

write(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 21) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=22649, si_uid=1000} ---
+++ killed by SIGPIPE +++

Answer 3

是的，从发球台到早期退出的东西（不依赖于从你的情况下读取发球台的输入）将导致间歇性错误。有关此问题的摘要，请参阅：

http://www.pixelbeat.org/docs/coreutils-gotchas.html#tee

如果后面跟着管道，Linux tee命令偶尔会失败

3 个答案:

文档