我有一个执行这两项工作的脚本: A.连接到数据库并从中检索数据。 B.处理检索到的数据。
所以我为每个人创建了两个线程。 我还利用Parallel :: ForkManager来生成子进程并执行作业A,这样我就可以一次连接到多个数据库。
我一直试图解决的问题是当我在ForkManager-> wait_all_children添加阻塞调用时。作业A无法在完成时运行其回调。
如果我从@threads数组中删除第二个作业(B),则会运行回调函数。 所以我觉得我在使用线程时有误解,但我想到了 这两个线程不应该互相阻塞。如果不是,那么可能是什么问题导致我的子进程被阻止完成其工作?
在设计说明中,每次工作A完成后我都无法完成工作B,因为工作B非常昂贵且可能会运行很长时间。因此,它将阻止我完成所有工作A.所以,我宁愿有一个单独的线程,它会定期对所有检索到的数据进行“批量”处理。
以下是我重现问题的示例代码:
use Data::Dumper;
use Parallel::ForkManager;
use threads;
use threads::shared;
my $isRunningJobThread :shared;
my @jobs = ('a', 'b', 'c', 'd', 'e', 'f', 'g');
my @threads = (
threads->create(\&jobThread),
threads->create(\&compileCompletedJobThread)
);
$_->join for @threads;
print "All done\n";
sub jobThread{
my $pm = Parallel::ForkManager->new(5);
$pm->run_on_finish(sub {
print "Job done\n";
});
$isRunningJobThread = 1;
foreach my $job (@jobs) {
$pm->start and next;
print "Do job for : $job\n";
$pm->finish;
}
$pm->wait_all_children;
}
sub compileCompletedJobThread{
while($isRunningJobThread) {
sleep 10;
print "Compiling completed jobs\n";
}
}
答案 0 :(得分:3)
threads
和fork
。这真是个可怕的想法。他们在某种程度上依赖于实现 - 你可以非常确保他们能够工作,但同时他们都要求一个痛苦的世界。 (并发问题,竞争条件等)。
在这种特殊情况下 - 您需要了解fork()
所做的是获取流程的完整副本 - 处于完全相同的状态 - 并且具有单个差异 - 返回代码fork()
。这意味着线程等也将由fork()
克隆。 Parallel::ForkManager
通过限制并行范围隐藏了你的一些内容,但这是幕后发生的事情。
我敦促你退后一步进行改写 - 你似乎正在做的事情更适合使用一些工作线程和Thread::Queue
:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue;
#parallelism limit
my $num_threads = 5;
#input and output queues
my $work_q = Thread::Queue -> new();
my $result_q = Thread::Queue -> new;
#jobs as before
my @jobs = ('a', 'b', 'c', 'd', 'e', 'f', 'g');
#worker - reads from queue one item at a time.
#exits if the queue is 'undef' which happens if it has been `end`ed.
sub worker {
while ( my $item = $work_q -> dequeue ) {
print threads -> self -> tid.": processing work item $item\n";
#pretend we did some work, queue the result.
$result_q -> enqueue ( threads -> self -> tid . ": finished $item" );
}
}
#spawn threads
threads -> create (\&worker) for 1..$num_threads;
#queue jobs
$work_q -> enqueue ( @jobs );
#close queue, so threads will exit when they hit the end of the queue.
#dequeue will return 'undef' rather than blocking.
$work_q -> end;
#join all the threads.
$_->join for threads -> list;
#all threads are finished, so we close the result queue.
#again - so dequeue is undef when empty, rather than just blocking.
$result_q -> end;
while ( my $result = $result_q -> dequeue ) {
print "Got result of $result\n";
}
print "All done\n";
由于您表明您正在寻找运行' result_q'同时,你也可以使用'结果处理程序'作为另一个线程,结果大致相同。
这会让略微痒,因为你需要知道你的退出'基于开放/封闭队列的条件。但是这样的事情:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue;
#parallelism limit
my $num_threads = 5;
#input and output queues
my $work_q = Thread::Queue->new;
my $result_q = Thread::Queue->new;
#jobs as before
my @jobs = ( 'a', 'b', 'c', 'd', 'e', 'f', 'g' );
#worker - reads from queue one item at a time.
#exits if the queue is 'undef' which happens if it has been `end`ed.
sub worker {
while ( my $item = $work_q->dequeue ) {
print threads->self->tid . ": processing work item $item\n";
#pretend we did some work, queue the result.
$result_q->enqueue( threads->self->tid . ": finished $item" );
}
}
#a thread to process the results in parallel
sub collator {
while ( my $result = $result_q->dequeue ) {
print "Got result of $result\n";
}
}
#spawn threads
my @workers = map { threads->create( \&worker ) } 1 .. $num_threads;
my $collator = threads -> create ( \&collator );
#queue jobs
$work_q->enqueue(@jobs);
#close queue, so threads will exit when they hit the end of the queue.
#dequeue will return 'undef' rather than blocking.
$work_q->end;
#join all the threads.
$_->join for @workers;
#all threads are finished, so we close the result queue.
#again - so dequeue is undef when empty, rather than just blocking.
$result_q->end;
#reap 'collator' once it's finished.
$collator->join;
print "All done\n";
它与上面几乎相同,但会产生一份“工人”列表。 - 因为那时你可以end
$work_q
,等待"工人"退出(和join
) - 然后你就知道在$result_q
进入end
后会有更多结果,然后collator
。 (等待{{1}}退出)。