我完全知道,有很多文章解释了亲子过程动力学的内部运作方式。我已经遍历了它们,并使我的东西可以正常运行,几乎可以。但是有一件事困扰着我,尽管多次尝试我还是听不懂。
问题::尽管收获了孩子,但main并不是在等待所有孩子完成游戏并过早退出。我相信我确实从子进程中退出,并且已经在子进程中安装了REAPER-那么在子进程完成之前,主要退出方式是什么?
不在这里寻找解决方案-但我需要一个新的方向,以便下个星期继续努力。截至目前-我觉得我已经用尽了所有的选择,尝试了很多事情,但无济于事。
关于我要实现的目标的一些背景:
总而言之-我希望所有孩子都能完成工作,只有到那时,我才想继续做些事情。每个子进程都会产生一堆线程,并且这些线程由该子进程正确地连接,然后继续以exit(0)
退出。
您可能会在程序中观察到额外的hoopla,但这只是我们的要求,我们必须使用5个API(引擎),但每次只能使用固定的批处理大小,例如每次10个。我为每个引擎启动子进程,并为每个请求启动线程-然后我等待所有线程完成,加入它们,然后子进程退出。直到现在,我才能将下一批请求存储到同一引擎,并且对所有引擎都执行此操作,直到我用完总请求数(例如10000)为止。
每个请求可能需要1秒钟到2个小时之间的任何时间-基本上是从HTTP API提取的CSV报告。
我的问题是当我用完所有请求集后-我无法等待MAIN等待所有子进程完成。这很奇怪,也是我要解决的问题。
有什么想法吗?
我的程序输出:
[compuser@lenovoe470:little-stuff]$ perl 07--20190526-batch-processing-using-threads-with-busy-pool-detection-2.pl 12
26710: STARTING TASKS IN BATCHES
26710: RUNNING batch_engine 1_e1 tasks (1 2)
26710: RUNNING batch_engine 2_e2 tasks (3 4)
26710: RUNNING batch_engine 3_e3 tasks (5 6 7)
26710: BUSY_ENGINE: e1.
26710: BUSY_ENGINE: e2.
26710: BUSY_ENGINE: e3.
26710: BUSY_ENGINE: e1.
26710: BUSY_ENGINE: e2.
26710:26712: TASK_ORCHESTRATOR: >> finished batch_engine (2_e2) tasks (3 4)
26710: PID (26712) has finished with status (0). updating proc hash
26710: BUSY_ENGINE: e3.
26710:26713: TASK_ORCHESTRATOR: >> finished batch_engine (3_e3) tasks (5 6 7)
26710:26711: TASK_ORCHESTRATOR: >> finished batch_engine (1_e1) tasks (1 2)
26710: PID (26713) has finished with status (0). updating proc hash
26710: BUSY_ENGINE: e1.
26710: PID (26711) has finished with status (0). updating proc hash
26710: RUNNING batch_engine 4_e2 tasks (8 9)
26710: RUNNING batch_engine 5_e3 tasks (10 11 12)
26710: FINISHED TASKS IN BATCHES
[compuser@lenovoe470:little-stuff]$ 1:26722: TASK_ORCHESTRATOR: >> finished batch_engine (5_e3) tasks (10 11 12)
1:26721: TASK_ORCHESTRATOR: >> finished batch_engine (4_e2) tasks (8 9)
在上面的输出中:
我的程序:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use POSIX ':sys_wait_h';
use Thread qw(async);
STDOUT->autoflush(1);
# doesn't work
sub reaper {
my $reaped;
while (($reaped = waitpid (-1,&WNOHANG) > 0)) {
print "$$: reaped: $reaped\n";
sleep(1);
}
$SIG{CHLD} = \&reaper;
}
# doesn't work
my @total_tasks = (1 .. shift || 9);
my @engines = (qw/e1 e2 e3/);
my $sizes = { e1 => 2, e2 => 2, e3 => 3, };
my $proc_hash;
my $global_string = "ENGINE";
# source: https://duyanghao.github.io/ways_avoid_zombie_process/
#
sub REAPER {
local ($!, $?);
while ( (my $reaped_pid = waitpid(-1, WNOHANG)) > 0 ) {
if ( WIFEXITED($?) )
{
# my
my $ret_code = WEXITSTATUS($?);
print "$$: PID ($reaped_pid) has finished with status ($ret_code). updating proc hash\n";
my $engine_name = $proc_hash->{$reaped_pid};
delete ($proc_hash->{$reaped_pid});
delete ($proc_hash->{$engine_name});
# my
# original
#my $ret_code = WEXITSTATUS($?);
#print "child process:$pid exit with code:$ret_code\n";
# original
}
}
}
#
$SIG{CHLD} = \&REAPER;
sub random_sleep_time {
return (int(rand(5)+1))
#return (sprintf "%.2f",(rand(1)+1))
}
sub task_runner {
my @args = @_;
my ($batch_engine, $task) = ($args[0]->[0],$args[0]->[1]);
STDOUT->autoflush(1);
my $task_time = random_sleep_time();
sleep ($task_time);
threads->exit(0);
#print "$$:".(threads->tid()).": TASK_RUNNER: $global_string ($batch_engine) task ($task) finished in $task_time seconds\n";
#return;
};
sub task_orchestrator {
my ($batch_engine, @tasks) = @_;
my $engine = (split (/_/,$batch_engine))[1];
my $task_orch_pid = fork();
die "Failed to fork task_orchestrator\n" if not defined $task_orch_pid;
if ($task_orch_pid != 0) {
$proc_hash->{$engine} = $task_orch_pid;
$proc_hash->{$task_orch_pid} = $engine;
}
if ($task_orch_pid == 0) {
STDOUT->autoflush(1);
my @tids;
for (my $i=1 ; $i <= $#tasks ; $i++) { push (@tids,$i) }
foreach my $task_number (0 .. $#tasks) {
$tids [$task_number] = threads->create (
\&task_runner,[$batch_engine,$tasks [$task_number]]
);
}
my $ppid = getppid();
foreach my $tid (@tids) {$tid->join()}
print "$ppid:$$: TASK_ORCHESTRATOR: >> finished batch_engine ($batch_engine) tasks (@tasks)\n";
exit (0);
}
}
sub update_proc_hash {
my $finished_pid = waitpid (-1, POSIX->WNOHANG);
if ($finished_pid > 0) {
print "$$: PID ($finished_pid) has finished. updating proc hash\n";
my $engine_name = $proc_hash->{$finished_pid};
delete ($proc_hash->{$finished_pid});
delete ($proc_hash->{$engine_name});
}
}
my $batch=1;
print "$$: STARTING TASKS IN BATCHES\n";
while (@total_tasks) {
foreach my $engine (@engines) {
update_proc_hash();
if (exists $proc_hash->{$engine}) {
print "$$: BUSY_ENGINE: $engine.\n";
sleep (1);
next;
}
else {
my @engine_tasks;
my $engine_max_tasks = $sizes->{$engine};
while ($engine_max_tasks-- != 0) {
my $task = shift @total_tasks;
push (@engine_tasks,$task) if $task;
}
if (@engine_tasks) {
my $batch_engine = $batch.'_'.$engine;
print "$$: RUNNING batch_engine $batch_engine tasks (@engine_tasks)\n";
task_orchestrator ("$batch_engine",@engine_tasks);
$batch++;
}
}
}
}
REAPER();
print "$$: FINISHED TASKS IN BATCHES\n";
__END__
3天后更新:谢谢SO社区。我再次感谢所有抽出时间来研究此问题并帮助发现和纠正问题的人。非常感谢。
允许我与最终程序共享新输出,以供大家参考。
使用此修复程序后的输出:
User@Host:/cygdrive/c/bash-home> perl test.pl
22044: STARTING TASKS IN BATCHES
22044: MAIN: engine (e1) is RUNNING batch #1 tasks: (1 2)
22044: MAIN: engine (e2) is RUNNING batch #2 tasks: (3 4 5)
22044: MAIN: engine (e3) is RUNNING batch #3 tasks: (6 7)
41456: TASK_RUNNER: engine (e1) finished batch #1 task #1 in (1.80) seconds
41456: TASK_RUNNER: engine (e1) finished batch #1 task #2 in (1.31) seconds
41456: TASK_ORCHESTRATOR: engine (e1) finished batch #1 tasks in (1.00) seconds.
22044: REAPER: TASK_ORCHESTRATOR pid (41456) has finished with status (0).
18252: TASK_RUNNER: engine (e2) finished batch #2 task #3 in (1.04) seconds
18252: TASK_RUNNER: engine (e2) finished batch #2 task #4 in (1.91) seconds
18252: TASK_RUNNER: engine (e2) finished batch #2 task #5 in (1.63) seconds
18252: TASK_ORCHESTRATOR: engine (e2) finished batch #2 tasks in (1.00) seconds.
22044: REAPER: TASK_ORCHESTRATOR pid (18252) has finished with status (0).
14544: TASK_RUNNER: engine (e3) finished batch #3 task #6 in (1.42) seconds
14544: TASK_RUNNER: engine (e3) finished batch #3 task #7 in (1.84) seconds
14544: TASK_ORCHESTRATOR: engine (e3) finished batch #3 tasks in (1.00) seconds.
22044: REAPER: TASK_ORCHESTRATOR pid (14544) has finished with status (0).
22044: MAIN: engine (e1) is RUNNING batch #4 tasks: (8 9)
22044: MAIN: engine (e2) is RUNNING batch #5 tasks: (10)
37612: TASK_RUNNER: engine (e1) finished batch #4 task #8 in (1.19) seconds
37612: TASK_RUNNER: engine (e1) finished batch #4 task #9 in (1.31) seconds
37612: TASK_ORCHESTRATOR: engine (e1) finished batch #4 tasks in (1.00) seconds.
16300: TASK_RUNNER: engine (e2) finished batch #5 task #10 in (1.53) seconds
16300: TASK_ORCHESTRATOR: engine (e2) finished batch #5 tasks in (1.00) seconds.
22044: ALL ORCHESTRATORS HAVE FINISHED
22044: FINISHED TASKS IN BATCHES
最终工作程序:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use POSIX ':sys_wait_h';
use threads;
STDOUT->autoflush(1);
my @total_tasks = (1 .. 10);
my $sleep_time = 1;
my @engines = (qw/e1 e2 e3/);
my $sizes = {
e1 => 2,
e2 => 3,
e3 => 2,
};
my $proc_hash;
my $global_string = "engine";
sub REAPER {
local ($!, $?);
while ( (my $reaped_pid = waitpid(-1, WNOHANG)) > 0 ) {
if ( WIFEXITED($?) ) {
my $ret_code = WEXITSTATUS($?);
print "$$: REAPER: TASK_ORCHESTRATOR pid ($reaped_pid) has finished with status ($ret_code).\n";
my $engine_name = $proc_hash->{$reaped_pid};
delete ($proc_hash->{$reaped_pid});
delete ($proc_hash->{$engine_name});
}
}
}
$SIG{CHLD} = \&REAPER;
sub random_sleep_time { return sprintf ("%.2f",(rand ($sleep_time||5) + 1)) }
sub task_runner {
STDOUT->autoflush(1);
my @args = @_;
my ($batch_engine, $task) = ($args[0]->[0],$args[0]->[1]);
my ($batch, $engine) = split (/_/,$batch_engine);
my $task_time = random_sleep_time();
sleep ($task_time);
print "$$: TASK_RUNNER: $global_string ($engine) finished batch #$batch task #$task in ($task_time) seconds\n";
threads->exit(0);
};
sub task_orchestrator {
my ($batch_engine, @tasks) = @_;
my ($batch, $engine) = split (/_/,$batch_engine);
my $task_orch_pid = fork();
die "Failed to fork task_orchestrator\n" if not defined $task_orch_pid;
if ($task_orch_pid != 0) {
$proc_hash->{$engine} = $task_orch_pid;
$proc_hash->{$task_orch_pid} = $engine;
}
if ($task_orch_pid == 0) {
STDOUT->autoflush(1);
my @tids;
my $start_time = time;
for (my $i=1 ; $i <= $#tasks ; $i++) { push (@tids,$i) }
foreach my $task_number (0 .. $#tasks) {
$tids [$task_number] = threads->create (
\&task_runner,[$batch_engine,$tasks [$task_number]]
);
}
foreach my $tid (@tids) {$tid->join()}
my $end_time = time;
my $total_time = sprintf ("%.2f",($end_time - $start_time));
print "$$: TASK_ORCHESTRATOR: engine ($engine) finished batch #$batch tasks in ($total_time) seconds.\n";
exit (0);
}
}
my $batch=1;
print "$$: STARTING TASKS IN BATCHES\n";
while (@total_tasks)
{
foreach my $engine (@engines)
{
if (exists $proc_hash->{$engine})
{
sleep (1);
next;
}
else
{
my @engine_tasks;
my $engine_max_tasks = $sizes->{$engine};
while ($engine_max_tasks-- != 0)
{
my $task = shift @total_tasks;
push (@engine_tasks,$task) if $task;
}
if (@engine_tasks)
{
my $batch_engine = $batch.'_'.$engine;
print "$$: MAIN: engine ($engine) is RUNNING batch #$batch tasks: (@engine_tasks)\n";
task_orchestrator ($batch_engine,@engine_tasks);
$batch++;
}
}
}
}
# All 3 below work properly
#sleep (.2) while ((waitpid(-1, WNOHANG)) >= 0);
#sleep (.2) while ((waitpid(-1, WNOHANG)) != -1);
sleep (.2) while ((waitpid(-1, WNOHANG)) > -1);
print "$$: ALL ORCHESTRATORS HAVE FINISHED\n";
print "$$: FINISHED TASKS IN BATCHES\n";
__END__
答案 0 :(得分:3)
如果存在与PID匹配的子进程但尚未终止的子进程,则可以返回0
,对于-1
,这适用于任何子进程。因此,一旦返回零,您在waitpid
中的REAPER
的非阻塞while
就会退出0
循环,在您的代码中有多个孩子会发生什么。 use warnings;
use strict;
use feature 'say';
use POSIX ':sys_wait_h';
use Time::HiRes qw(sleep) ;
for (1..4) {
my $pid = fork // die "Can't fork: $!";
if ($pid == 0) {
sleep rand 4;
say "\tkid $$ exiting";
exit;
};
};
while ( (my $kid = waitpid -1, WNOHANG) > -1 ) {
say "got $kid" if $kid > 0;
sleep 0.2;
}
的返回值使我们可以等待,只要有不终止的子进程,就可以满足您的要求。
一种解决方法是轮询非负收益
while ( (my $kid = waitpid -1, 0) > -1 ) {
say "got $kid";
}
打印
kid 12687 exiting got 12687 kid 12690 exiting got 12690 kid 12689 exiting got 12689 kid 12688 exiting got 12688
请适当调整轮询周期。请注意,由于这会捕获任何个子进程,因此,如果到那时为止有任何未等待的分支,它可能会干扰其他分支。
或者您可以通过等待来阻止
> 0
您现在还可以在其中进行-1
;我们只需要循环在0
返回(那里没有更多进程)后终止就可以了,而由于调用阻塞,这里将没有fork
返回。
主要区别在于,该块仅在子进程实际退出后才执行,因此如果您需要密切注意一些长期运行的子进程正在执行的操作(并可能限制其运行时间或防止挂起的作业),即这种形式不那么容易;您需要对此进行非阻塞操作。
请注意,某些细节(尤其是与退货有关的细节)可能会因系统而异。
这个天真的版本是在您foreach my $pid (@pids) {
my $gone = waitpid $pid, 0;
say "Process $gone exited with $?" if $gone > 0; # -1 if reaped already
}
时收集PID,然后等待它们
waitpid
对于每个进程,其以$this->db->where("WHERE yourColumn IS NULL OR yourColumn = ''");
进行阻止。问题是,如果一个进程的运行时间比其他进程(或挂起)的时间长得多,则该循环将被阻塞等待。
答案 1 :(得分:1)
在退出主循环时,您调用REAPER(),它执行无阻塞的waitpid()。不阻塞。非。而且它没有阻塞。所以它正在退出。
当我在这里时,我注意到您的update_proc_hash()函数没有像其他执行waitpid()的事情那样循环,因此它没有捕获所有可能的东西。帮自己一个忙,整齐地整理出所有东西。