在Perl

时间:2017-03-13 05:40:46

标签: multithreading perl parallel-processing

我有一个执行这两项工作的脚本: A.连接到数据库并从中检索数据。 B.处理检索到的数据。

所以我为每个人创建了两个线程。 我还利用Parallel :: ForkManager来生成子进程并执行作业A,这样我就可以一次连接到多个数据库。

我一直试图解决的问题是当我在ForkManager-> wait_all_children添加阻塞调用时。作业A无法在完成时运行其回调。

如果我从@threads数组中删除第二个作业(B),则会运行回调函数。 所以我觉得我在使用线程时有误解,但我想到了 这两个线程不应该互相阻塞。如果不是,那么可能是什么问题导致我的子进程被阻止完成其工作?

在设计说明中,每次工作A完成后我都无法完成工作B,因为工作B非常昂贵且可能会运行很长时间。因此,它将阻止我完成所有工作A.所以,我宁愿有一个单独的线程,它会定期对所有检索到的数据进行“批量”处理。

以下是我重现问题的示例代码:

use Data::Dumper;
use Parallel::ForkManager;
use threads;
use threads::shared;

my $isRunningJobThread :shared;    

my @jobs = ('a', 'b', 'c', 'd', 'e', 'f', 'g');
my @threads = (
    threads->create(\&jobThread),
    threads->create(\&compileCompletedJobThread)
);

$_->join for @threads;

print "All done\n";

sub jobThread{
    my $pm = Parallel::ForkManager->new(5);
    $pm->run_on_finish(sub {
        print "Job done\n";
    });

    $isRunningJobThread = 1;
    foreach my $job (@jobs) {
        $pm->start and next;
        print "Do job for : $job\n";
        $pm->finish;
    }

    $pm->wait_all_children;
}

sub compileCompletedJobThread{
    while($isRunningJobThread) {
        sleep 10;

        print "Compiling completed jobs\n";
    }
}

1 个答案:

答案 0 :(得分:3)

好的,哇。在那里慢下来。您在同一代码中执行threadsfork。这真是个可怕的想法。他们在某种程度上依赖于实现 - 你可以非常确保他们能够工作,但同时他们都要求一个痛苦的世界。 (并发问题,竞争条件等)。

在这种特殊情况下 - 您需要了解fork()所做的是获取流程的完整副本 - 处于完全相同的状态 - 并且具有单个差异 - 返回代码fork()。这意味着线程等也将由fork()克隆。 Parallel::ForkManager通过限制并行范围隐藏了你的一些内容,但这是幕后发生的事情。

我敦促你退后一步进行改写 - 你似乎正在做的事情更适合使用一些工作线程和Thread::Queue

#!/usr/bin/env perl
use strict;
use warnings;

use threads;
use Thread::Queue;

#parallelism limit
my $num_threads = 5;

#input and output queues
my $work_q = Thread::Queue -> new();
my $result_q = Thread::Queue -> new;

#jobs as before
my @jobs = ('a', 'b', 'c', 'd', 'e', 'f', 'g');

#worker - reads from queue one item at a time. 
#exits if the queue is 'undef' which happens if it has been `end`ed. 
sub worker {
   while ( my $item = $work_q -> dequeue ) { 
      print threads -> self -> tid.": processing work item $item\n";
      #pretend we did some work, queue the result. 
      $result_q -> enqueue ( threads -> self -> tid . ": finished $item" );
   }
}

#spawn threads
threads -> create (\&worker) for 1..$num_threads;
#queue jobs
$work_q -> enqueue ( @jobs ); 
#close queue, so threads will exit when they hit the end of the queue. 
#dequeue will return 'undef' rather than blocking. 
$work_q -> end;  

#join all the threads. 
$_->join for threads -> list;
#all threads are finished, so we close the result queue. 
#again - so dequeue is undef when empty, rather than just blocking. 
$result_q -> end; 

while ( my $result = $result_q -> dequeue ) { 
    print "Got result of $result\n";
}

print "All done\n";

由于您表明您正在寻找运行' result_q'同时,你也可以使用'结果处理程序'作为另一个线程,结果大致相同。

这会让略微痒,因为你需要知道你的退出'基于开放/封闭队列的条件。但是这样的事情:

#!/usr/bin/env perl
use strict;
use warnings;

use threads;
use Thread::Queue;

#parallelism limit
my $num_threads = 5;

#input and output queues
my $work_q   = Thread::Queue->new;
my $result_q = Thread::Queue->new;

#jobs as before
my @jobs = ( 'a', 'b', 'c', 'd', 'e', 'f', 'g' );

#worker - reads from queue one item at a time.
#exits if the queue is 'undef' which happens if it has been `end`ed.
sub worker {
   while ( my $item = $work_q->dequeue ) {
      print threads->self->tid . ": processing work item $item\n";

      #pretend we did some work, queue the result.
      $result_q->enqueue( threads->self->tid . ": finished $item" );
   }
}

#a thread to process the results in parallel
sub collator {
   while ( my $result = $result_q->dequeue ) {
      print "Got result of $result\n";
   }
}

#spawn threads
my @workers = map { threads->create( \&worker ) } 1 .. $num_threads;
my $collator = threads -> create ( \&collator );

#queue jobs
$work_q->enqueue(@jobs);

#close queue, so threads will exit when they hit the end of the queue.
#dequeue will return 'undef' rather than blocking.
$work_q->end;

#join all the threads.
$_->join for @workers;

#all threads are finished, so we close the result queue.
#again - so dequeue is undef when empty, rather than just blocking.
$result_q->end;

#reap 'collator' once it's finished.
$collator->join;


print "All done\n";

它与上面几乎相同,但会产生一份“工人”列表。 - 因为那时你可以end $work_q,等待"工人"退出(和join) - 然后你就知道在$result_q进入end后会有更多结果,然后collator。 (等待{{1}}退出)。