为什么来自子进程的管道有时会中断,有时不会中断?

时间:2018-05-04 14:41:32

标签: perl ipc

我正在使用Perl处理一个带有主脚本的系统,该脚本需要执行几个不同的工作脚本,这些脚本同时在系统上执行不同的任务。
这些脚本也不需要用Perl编写。它们可以是在命令行上可执行的任何程序,并通过STDOUTSTDERR以及EXITCODE返回结果。

所以我做了一个示例脚本hello.pl

#!/usr/bin/perl

print STDERR "HELLO ERROR 0 !\n";

print "hello script!!\n";

print "hello waiting 3 sec ...\n";

sleep 3;

print "hello list:\n";
print "cmd: 'ls -lah'\n";

print `ls -lah`;

print "res: '$?'\n";    

print "hello waiting 2 sec ...\n";

sleep 2;    

print STDERR "HELLO ERROR 1 !\n";

主脚本需要在一侧捕获STDOUT的结果,在另一侧捕获STDERR中的错误通知。

主脚本执行fork以非阻塞模式执行工作程序脚本,并使用waitpidselect来监视工作程序脚本的进度。

所以master.pl脚本的代码是这样的:

#!/usr/bin/perl

use strict;
use warnings;

use IO::Select;
use POSIX ":sys_wait_h";


sub Launch
{
  my $rprocess = shift;
  my $irs = 0;


  local *logreader;
  local *errorreader;
  my $logwriter   = undef;
  my $errorwriter = undef;

  pipe( *logreader,   $logwriter );
  pipe( *errorreader, $errorwriter );

  #Spawn the Child Process
  my $iprcpid = fork();

  #Check the Success of Process Forking
  if(defined $iprcpid)
  {           
    #------------------------
    #Sub Process Launch succeeded

    # Check whether parent/child process
    if($iprcpid > 0)
    {
      #------------------------
      #Parent Process

      close($logwriter);
      close($errorwriter);

      $rprocess->{"pid"} = $iprcpid;
      $rprocess->{"log_pipe"} = *logreader;
      $rprocess->{"error_pipe"} = *errorreader;

      $irs = 1;
    }
    elsif($iprcpid == 0)
    {
      #------------------------
      #Child Process

      my $ierr = 0;

      close(*logreader);
      close(*errorreader);

      open( STDOUT, ">&=", $logwriter );
      open( STDERR, ">&=", $errorwriter );


      #------------------------
      #Execute the configured Command

      print "cmd: '" . $rprocess->{"file"} . "'\n";

      print "cmd rng ...\n";

      print `$rprocess->{"file"}`; 

      $ierr = $?;

      print "closing transmission ...\n";   

      close STDOUT;
      close STDERR;

      exit $ierr; 
    }
    else    #An Error has ocurred in the Sub Process Launch
    {
        # Unable to fork
        print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!\n";
    }   #if($iprcpid > 0)
  }
  else    #An Error has ocurred in the Process Spawning   
  {
    # Unable to fork
    print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!\n";
  }  #if(defined $iprcpid)

  return $irs;
}

sub Read
{
  my $rprocess = shift;

  my $ppsel = $rprocess->{"select"};
  my $pplog = $rprocess->{"log_pipe"};
  my $pperr = $rprocess->{"error_pipe"};
  my @arrppselrdy = undef;
  my $pprdy = undef;
  my $srdln = "";
  my $irdcnt = 0;

  unless(defined $ppsel)
  {
    $ppsel = IO::Select->new();

    $ppsel->add($pplog);
    $ppsel->add($pperr);

    #Store the Selector Object
    $rprocess->{"select"} = $ppsel;
  }
  else  #The Selector was already created
  {
    $ppsel = $rprocess->{"select"};
  }  #unless(defined $ppsel)    

  while(@arrppselrdy = $ppsel->can_read(1))
  {
    foreach $pprdy (@arrppselrdy)
    {
      $irdcnt = sysread($pprdy, $srdln, 8192);

      if(defined $irdcnt)
      {
        if($irdcnt > 0)
        {
          if(fileno($pprdy) == fileno($pplog))
          {
            $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading report ...\n";

            $rprocess->{"log"} .= $srdln;
          }
          elsif(fileno($pprdy) == fileno($pperr))
          {
            $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading error ...\n";

            $rprocess->{"error"} .= $srdln;
          }   #if(fileno($pprdy) == fileno($pplog))
        }
        else    #End of Transmission
        {
          $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): transmission done.\n";

          #Remove the Pipe File Handle
          $ppsel->remove($pprdy);  

        } #if($irdcnt > 0)
      }
      else  #Reading from the Pipe failed
      {
        #Remove the Pipe File Handle
        $ppsel->remove($pprdy);

        if($!)
        {
          $rprocess->{"error"} .= "ERROR: Sub Process " . $rprocess->{"name"} 
            . ": Read failed with [" . ($! * 1) . "]!\n"
            . "Message: '$!'\n";
        }  #if($!)
      }  #if(defined $irdcnt)
    }  #foreach $pprdy (@arrppselrdy)
  } #while(@arrppselrdy = $ppsel->can_read(1))

  $rprocess->{"log"} .= "try read done. '" . $ppsel->count . "' pipes left.\n";
}

sub Check
{
  my $rprocess = shift;
  my $irng = 0;

  if($rprocess->{"pid"} > 0
    && $rprocess->{"status"} < 0)
  {
    my $ifinishpid = waitpid($rprocess->{"pid"}, WNOHANG);

    if($ifinishpid > -1)
    {
      if($ifinishpid == 0)
      {
        $irng = 1;
      }
      else
      {
        $rprocess->{"status"} = $?;
      }

      #Read the Message Pipes
      Read($rprocess);

      if($ifinishpid > 0)
      {
        close $rprocess->{"log_pipe"};
        close $rprocess->{"error_pipe"};
      }
    }
    else  #Sub Process does not exist
    {
      $rprocess->{"error"} .= "ERROR: Sub Process does not exist!\n";
    }  #if($ifinishpid > 0)
  }  #if($rprocess->{"pid"} > 0 && $rprocess->{"status"} < 0)

  return $irng;
}



my %hshprocesses = ("hello1" => {"name" => "hello1", "pid" => -1, "file" => "./hello.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello2" => {"name" => "hello2", "pid" => -1, "file" => "sleep 3 ; ./hello2.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello3" => {"name" => "hello3", "pid" => -1, "file" => "./hello3.pl ; sleep 2"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello4" => {"name" => "hello4", "pid" => -1, "file" => "./hello4.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1});


print "prcs launching ...\n";

foreach (keys %hshprocesses)
{
  unless(Launch($hshprocesses{$_}))
  {
    print "prc '" . $hshprocesses{$_}{"name"} . "': Launch failed!\n";
  }
}  #foreach (keys %hshprocesses)

print "prcs launched.\n";

my $irunningcount = 0;

do
{
  $irunningcount = 0;

  foreach (keys %hshprocesses)
  {
    $irunningcount++ if(Check($hshprocesses{$_}));
  }  #foreach (keys %hshprocesses)

  print "prc rng cnt: '$irunningcount'\n";
}
while($irunningcount > 0);

foreach (keys %hshprocesses)
{
  print "process (" . $hshprocesses{$_}{"pid"} . ") '" . $hshprocesses{$_}{"name"} . "':\n";
  print "status [" . $hshprocesses{$_}{"status"} . "]\n";
  print "log:\n"
    . "'" . $hshprocesses{$_}{"log"} . "'\n";
  print "error:\n"
    . "'" . $hshprocesses{$_}{"error"} . "'\n";
}  #foreach (keys %hshprocesses)

所以阅读logreadererrorreader我认为我能够捕获整个输出。

预期的报告将是这样的:

$ ./master.pl
prcs launching ...
prcs launched.
prc rng cnt: '4'
prc rng cnt: '3'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '0'
process (2742) 'hello4':
status [0]
log:
'pipe (3): reading report ...
cmd: './hello4.pl'
cmd rng ...
pipe (5): reading error ...
try read done. '2' pipes left.
pipe (5): reading error ...
pipe (3): reading report ...
hello4 script!!
hello waiting 3 sec ...
hello4 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
pipe (3): reading report ...
closing transmission ...
pipe (5): transmission done.
pipe (3): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO4 ERROR 0 !
HELLO4 ERROR 1 !
'
process (2743) 'hello1':
status [0]
log:
'pipe (4): reading report ...
cmd: 'sleep 3 ; ./hello.pl'
cmd rng ...
try read done. '2' pipes left.
pipe (7): reading error ...
try read done. '2' pipes left.
try read done. '2' pipes left.
pipe (7): reading error ...
pipe (4): reading report ...
hello script!!
hello waiting 3 sec ...
hello list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (7): transmission done.
pipe (4): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO ERROR 0 !
HELLO ERROR 1 !
'
process (2745) 'hello3':
status [0]
log:
'pipe (6): reading report ...
cmd: './hello3.pl ; sleep 2'
cmd rng ...
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (6): reading report ...
hello3 script!!
hello waiting 3 sec ...
hello3 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (9): transmission done.
pipe (6): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO3 ERROR 0 !
HELLO3 ERROR 1 !
'
process (2746) 'hello2':
status [0]
log:
'pipe (8): reading report ...
cmd: './hello2.pl'
cmd rng ...
pipe (11): reading error ...
try read done. '2' pipes left.
pipe (8): reading report ...
hello2 script!!
hello waiting 3 sec ...
hello2 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (11): reading error ...
pipe (8): transmission done.
pipe (11): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO2 ERROR 0 !
HELLO2 ERROR 1 !
'

但只有当只有1个工作脚本时,这才有效 如果主脚本试图读取几个工作脚本,那么在执行工作脚本hello.pl之前,该传输管道将由子项关闭。 最后一件事是:

        cmd: './hello.pl'
        cmd rng ...

主脚本使用IO::Select模块通过选择器Object从logreadererrorreader读取。
这样我在进度日志中找到了:

        cmd: './hello.pl'
        cmd rng ...
        pipe (8): transmission done.

hello.pl中的所有内容都丢失了。

有时我仍会在错误报告中找到hello.pl STDERR的第一行

HELLO ERROR 0 !

为什么管道有时会断裂,有时不会? 我怎样才能预防这种情况发生?

现在有了这个测试脚本,我无法重现断管的错误 但即使在真实世界应用程序中,它有时会发生,有时也不会发生 我在一个带有perl v5.10和4个进程3运行和1个中断的系统上开发了这个应用程序。特别是如果它进行sleep调用。

在什么情况下管道会断裂?

编辑2018-05-05:我构建了主脚本master.pl并使用perl v5.24.1运行它

2 个答案:

答案 0 :(得分:0)

我认为你必须使用wait而不是waitpidwait等待任何孩子,而waitpid只等待一个特定的孩子。

答案 1 :(得分:-1)

由于没有人能在1个月内给我任何有用的答案或提示,我会发布我发现的关于断管现象的内容,以帮助其他可能遇到类似问题的人

最后,我可以追踪断管的现象 我添加了足够长的休眠时间,让我可以使用strace命令跟踪系统活动,它实际上记录了 Broken Pipe 错误:

$ strace -p 11729
Process 11729 attached
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, "hello script!!\nhello waiting 20 "..., 67) = 67
pipe([3, 4])                            = 0
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe670d5f9d0) = 13238
close(6)                                = 0
close(4)                                = 0
read(5, "", 4)                          = 0
close(5)                                = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff334b57b0) = -1 EINVAL (Invalid argument)
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "total 228K\ndrwxr-xr-x.  4 usr15 "..., 4096) = 1868
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
wait4(13238, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13238
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13238, si_status=0, si_utime=0, si_stime=0} ---
rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({30, 0}, 0x7fff334b5b90)      = 0
write(2, "HELLO ERROR 1 !\n", 16)       = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=11729, si_uid=501} ---
+++ killed by SIGPIPE +++

它假定hello.pl正确完成并结束写错误消息write(2, "HELLO ERROR 1 !\n", 16) = -1的文件 但它发现管道已损坏,导致错误EPIPE (Broken pipe)导致其与+++ killed by SIGPIPE +++

崩溃

所以问题的答案
在什么情况下管道可能会中断?

  • 如果分叉的子进程已被终止

但为什么在管道损坏之前我无法读取输出?

削减输出

的现象
    cmd: './hello.pl'
    cmd rng ...
    pipe (8): transmission done.
  • 来自分叉子进程中print()的组合 和父进程中的sysread()。输出是 实际上缓冲在Child端,最后写入Parent 侧。

为什么在收到输出之前终止子进程?

  • 这来自实际中与物联合的结合 实现Object::DESTROY的真实世界应用程序 方法。
    该方法包括静止的kill()清除逻辑 运行理论上不应触发的子进程。
    但在真实世界应用程序中,进程并非即时 读取并收获导致他们被杀死的原因 Object::DESTROY逻辑。
    但我在执行日志中找不到这种情况,因为它已经被 Perl的垃圾收集器销毁了。

因此,某些进程会被杀死但其他进程则不会被杀死。有时它们会被杀死,但有时则不会。