Perl多线程代码在中间被杀

时间:2016-08-30 01:51:22

标签: multithreading perl

我有一段Perl多线程代码如下。我有两个问题:

  1. @correlatedPttns数组的长度约为500.执行时,我在linux中使用htop来检查正在运行的进程数,似乎只有3个进程正在使用。是不是应该创建更多线程?

  2. 程序在中间坠毁,当它完成约140美元对。为什么会这样?如果我在少量@correlatedPttns上运行相同的代码,那么它运行正常。

  3. 谢谢!

    代码:

    my @threads = ();
    foreach my $pair (@correlatedPttns)
    {    
       # slice the data out
        my @tmp = ();
        for (my $x = 0; $x<$cnt; $x++)
        {    
            push @tmp,[ @{ $data[$x] } [ 0, @$pair[0], @$pair[1] ] ]; 
        }    
       push (@threads, threads->create (\&thread_func, $pair, \@pttnIndexMap, \@tmp,$cnt, $intervalOutput));
    }
    
    foreach (@threads)
    {
       $_->join(); # blocks until this thread exits
    }
    

2 个答案:

答案 0 :(得分:0)

好的,停一会儿 - 回溯一下,重新考虑你对线程的理解。

perl中,一个线程是一个轻量级的东西,就像你似乎在假设一样。它是您程序的完整副本。如果您启动500个线程,那么您将急需耗尽系统资源。此外 - 线程启动不是一个轻量级的过程。做你正在做的事情实际上是个坏主意。 (虽然通过fork()来做这件事并不是一个问题,如果你真的想这样做的话)。

如果我不得不猜测(而且我这样做,因为你没有给我们足够的信息)那么我猜你的'崩溃'是由于内存不足,你的过程被杀死了。 /var/log/messages或类似的人应该告诉你。

我不知道为什么htop没有按照您的想法行事 - 尝试ps -eft并查看是否这样做了?

但是,我强烈要求重写你的代码 - 如果你有一系列要处理的东西没有耦合,那么使用Thread::Queue和一组工人会更加明智线程。

像这样重写thread_func

my $workers = 10; 
my $work_q = Thread::Queue -> new(); 

sub thread_func { 
   while ( my $pair = $work_q -> dequeue ) { 
        ## all the messing around you do with each pair; 
   }
}

在你的主要:

thread -> create ( \&thread_func ) for 1..$workers; 
$work_q -> enqueue ( @correlatedPttns ); 
$work_q -> end; 
foreach my $thr ( threads -> list ) { 
    $thr -> join(); 
}
然后,这将启动10名工作人员,每次都会按照work_q一对“工作”。您可以使用结果队列以类似的方式合并任何结果。

我还要指出 - 维护一个自己的线程列表是多余的,因为threads->list会为你做这件事。

答案 1 :(得分:0)

here is design 1 according to your suggestions:

my $workers = Sys::CPU::cpu_count();
my $work_q = Thread::Queue -> new();

threads->create (\&thread_func, \@pttnIndexMap, \@data, $cnt, $intervalOutput) for 1..$workers; 
$work_q -> enqueue ( @correlatedPttns ); 
$work_q -> end; 

foreach my $thr ( threads -> list() )
{    
    $thr -> join(); 
} 

in the thread_func:

  sub thread_func
  {
    # omit here other stuffs
    while ( my $pair = $work_q -> dequeue() )
    {
         #get the element from @data (2D array) based on the $pair information and run processing
    }
  }

Here is Design 2 based on limiting threads number of my original implementation:

my $number_of_cpus = Sys::CPU::cpu_count();
my $threadCnt = 0; 

foreach my $pair (@correlatedPttns)
{    
   # slice the data out
    my @tmp = ();
    for (my $x = 0; $x<$cnt; $x++)
    {
        push @tmp,[ @{ $data[$x] } [ 0, @$pair[0], @$pair[1] ] ]; # epochtime, Pi, Pj
    }

    threads->create (\&thread_func, $pair, \@pttnIndexMap, \@tmp,$cnt, $intervalOutput);
    $threadCnt++;

    if ($threadCnt >= $number_of_cpus)
    {
        $threadCnt = 0; 
        foreach my $thr ( threads -> list() )
        {
            $thr -> join(); 
        }
    }

}    

foreach my $thr ( threads -> list() )
{    
    $thr -> join();
}

Now both design work fine with the large dataset. However, I notice the big difference in terms of speed. Design 1 works much slower than design 2. In Design 1, I notice that it executes $workers threads, but each thread only uses <30% of CPU. On the other hand, design 2 executes 3 threads each time, and each thread uses 100% CPU.

The execution time for Design 1 is almost the same as sequential programming (without any multi-threading). Why? I notice that in Design 1, the @data (2d array) is used in the thread_func, each thread needs to access certain columns of the 2d array based on $pair information, will this slow down the multi-thread processing? In Design 2, I first slice those columns out before send it to thread_func, so that each thread is independent.

Thanks a lot!!