如何等待孙子进程(由于SIG CHLD,bash retval在Perl中变为-1)

时间:2019-06-05 19:57:57

标签: bash perl wait background-process child-process

我有一个Perl脚本(下面的片段),该脚本在cron中运行以执行系统检查。我派生了一个孩子作为超时时间,并使用SIG {CHLD}获得了该时间。 Perl对Bash脚本进行了几次系统调用,并检查它们的退出状态。一个bash脚本大约有5%的时间失败,没有错误。 Bash脚本以0存在,Perl看到$?分别为-1和$!作为“没有子进程”。

此bash脚本测试编译器许可证,并且在Bash脚本完成后保留Intel icc(下面的ps输出)。我认为icc僵尸完成了,迫使Perl进入SIG {CHLD}处理程序,这使$破灭了?状态才能读取。

Compile status -1; No child processes

#!/usr/bin/perl
use strict;
use POSIX ':sys_wait_h';

my $GLOBAL_TIMEOUT = 1200;

### Timer to notify if this program hangs
my $timer_pid;
$SIG{CHLD} = sub {
    local ($!, $?);
    while((my $pid = waitpid(-1, WNOHANG)) > 0)
    {
        if($pid == $timer_pid)
        {
            die "Timeout\n";
        }
    }
};

die "Unable to fork\n" unless(defined($timer_pid = fork));
if($timer_pid == 0)  # child
{
    sleep($GLOBAL_TIMEOUT);
    exit;
}
### End Timer

### Compile test
my @compile = `./compile_test.sh 2>&1`;
my $status = $?;
print "Compile status $status; $!\n";
if($status != 0)
{
    print "@compile\n";
}

END  # Timer cleanup
{
    if($timer_pid != 0)
    {
        $SIG{CHLD} = 'IGNORE';
        kill(15, $timer_pid);
    }
}

exit(0);
#!/bin/sh

cc compile_test.c
if [ $? -ne 0 ]; then
    echo "Cray compiler failure"
    exit 1
fi

module swap PrgEnv-cray PrgEnv-intel
cc compile_test.c
if [ $? -ne 0 ]; then
    echo "Intel compiler failure"
    exit 1
fi

wait
ps
exit 0

等待并不是真正的等待,因为cc调用icc会创建一个僵尸孙进程,该进程不会等待(或等待PID)。 (在这种情况下,等待`pidof icc`,在31589中给出“不是该shell的子对象”)

user 31589     1  0 12:47 pts/15   00:00:00 icc

我只是不知道如何在Bash或Perl中解决此问题。

谢谢克里斯

3 个答案:

答案 0 :(得分:1)

这不是alarm的用例吗?扔掉您的SIGCHLD处理程序,然后说

local $? = -1;
eval {
    local $SIG{ALRM} = sub { die "Timeout\n" };
    alarm($GLOBAL_TIMEOUT);
    @compile = `./compile_test.sh 2>&1`;
    alarm(0);
};

my $status = $?;

相反。

答案 1 :(得分:1)

我认为最快的解决方案是在bash脚本的底部添加一两秒钟的睡眠时间,以等待僵尸icc完成。但这没用。

如果我还没有SIG ALRM(在实际程序中),我同意最好的选择是将整个过程包装成一个eval。甚至认为对于500行程序而言,这将是非常丑陋的。

没有本地($?),每个“系统”调用都会获得$? = -1。 $?在这种情况下,我需要在waitpid之后,然后在sig处理程序退出后不幸设置为-1。所以我发现这可行。 新行显示为###

my $timer_pid;
my $chld_status;    ###
$SIG{CHLD} = sub {
    local($!, $?);
    while((my $pid = waitpid(-1, WNOHANG)) > 0)
    {
        $chld_status = $?;    ###
        if($pid == $timer_pid)
        {
            die "Timeout\n";
        }
    }
};

...
my @compile = `./compile_test.sh 2>&1`;
my $status = ($? == -1) ? $chld_status : $?;    ###
...

答案 2 :(得分:1)

我们遇到了类似的问题,这是我们的解决方案:将写侧文件描述符泄漏到孙子中,并从其中读取()直到其退出。

另请参阅:wait for children and grand-children

use Fcntl;

# OCF scripts invoked by Pacemaker will be killed by Pacemaker with
# a SIGKILL if the script exceeds the configured resource timeout. In
# addition to killing the script, Pacemaker also kills all of the children
# invoked by that script. Because it is a kill, the scripts cannot trap
# the signal and clean up; because all of the children are killed as well,
# we cannot simply fork and have the parent wait on the child. In order
# to work around that, we need the child not to have a parent proccess
# of the OCF script---and the only way to do that is to grandchild the
# process. However, we still want the parent to wait for the grandchild
# process to exit so that the OCF script exits when the grandchild is
# done and not before. This is done by leaking the write file descriptor
# from pipe() into the grandchild and then the parent reads the read file
# descriptor, thus blocking until it gets IO or the grandchild exits. Since
# the file descriptor is never written to by the grandchild, the parent
# blocks until the child exits.
sub grandchild_wait_exit
{
    # We use "our" instead of "my" for the write side of the pipe. If
    # we did not, then when the sub exits and $w goes out of scope,
    # the file descriptor will close and the parent will exit.
    pipe(my $r, our $w);

    # Enable leaking the file descriptor into the children
    my $flags = fcntl($w, F_GETFD, 0) or warn $!;
    fcntl($w, F_SETFD, $flags & (~FD_CLOEXEC)) or die "Can't set flags: $!\n";

    # Fork the child
    my $child = fork();
    if ($child) {
        # We are the parent, waitpid for the child and
        # then read to wait for the grandchild.
        close($w);
        waitpid($child, 0);
        <$r>;
        exit;
    }

    # Otherwise we are the child, so close the read side of the pipe.
    close($r);

    # Fork a grandchild, exit the child.
    if (fork()) {
        exit;
    }

    # Turn off leaking of the file descriptor in the grandchild so
    # that no other process can write to the open file descriptor
    # that would prematurely exit the parent.
    $flags = fcntl($w, F_GETFD, 0) or warn $!;
    fcntl($w, F_SETFD, $flags | FD_CLOEXEC) or die "Can't set flags: $!\n";
}

grandchild_wait_exit();

sleep 1;
print getppid() . "\n";
print "$$: gc\n";
sleep 30;
exit;