perl系统调用在使用线程时导致挂起

时间:2013-08-16 07:30:49

标签: multithreading perl system-calls

我是perl的新手,所以请原谅我的无知。 (我正在使用Windows 7)

我借用了echicken的线程示例脚本,并希望将其用作脚本进行大量系统调用的基础,但我遇到的问题超出了我的理解范围。为了说明我看到的问题,我在下面的示例代码中执行一个简单的ping命令。

  • $nb_process是允许的数字或同时运行的线程。
  • $nb_compute作为我们想要运行子例程的次数(即我们将发出ping命令的总时间)。

当我将$nb_compute$nb_process设置为彼此相同的值时,它的效果非常好。

但是,当我减少$nb_process(以限制任何时间运行的线程数)时,一旦$nb_process中定义的线程数已经开始,它似乎就会锁定。

如果我删除系统调用(ping命令),它可以正常工作。

我看到其他系统调用的相同行为(它不仅仅是ping)。

请有人帮忙吗?我在下面提供了脚本。

#!/opt/local/bin/perl -w  
 use threads;  
 use strict;  
 use warnings;  

 my @a = ();  
 my @b = ();  


 sub sleeping_sub ( $ $ $ ); 

 print "Starting main program\n";  

 my $nb_process = 3;  
 my $nb_compute = 6;  
 my $i=0;  
 my @running = ();  
 my @Threads;  
 while (scalar @Threads < $nb_compute) {  

     @running = threads->list(threads::running);  
     print "LOOP $i\n";  
     print "  - BEGIN LOOP >> NB running threads = ".(scalar @running)."\n";  

     if (scalar @running < $nb_process) {  
         my $thread = threads->new( sub { sleeping_sub($i, \@a, \@b) });  
         push (@Threads, $thread);  
         my $tid = $thread->tid;  
         print "  - starting thread $tid\n";  
     }  
     @running = threads->list(threads::running);  
     print "  - AFTER STARTING >> NB running Threads = ".(scalar @running)."\n";  
     foreach my $thr (@Threads) {  
         if ($thr->is_running()) {  
             my $tid = $thr->tid;  
             print "  - Thread $tid running\n";  
         }  
         elsif ($thr->is_joinable()) {  
             my $tid = $thr->tid;  
             $thr->join;  
             print "  - Results for thread $tid:\n";  
             print "  - Thread $tid has been joined\n";  
         }  
     }  

     @running = threads->list(threads::running);  
     print "  - END LOOP >> NB Threads = ".(scalar @running)."\n";  
     $i++;  
 }  

 print "\nJOINING pending threads\n";  
 while (scalar @running != 0) {  
    foreach my $thr (@Threads) {  
         $thr->join if ($thr->is_joinable());  
     }  
     @running = threads->list(threads::running);  
}  
 print "NB started threads = ".(scalar @Threads)."\n";  
 print "End of main program\n";  


 sub sleeping_sub ( $ $ $ ) { 
    my @res2 = `ping 136.13.221.34`; 
    print "\n@res2";
    sleep(3);  
 } 

1 个答案:

答案 0 :(得分:3)

你的程序的主要问题是你有一个繁忙的循环来测试一个线程是否可以加入。这很浪费。此外,您可以减少全局变量的数量,以便更好地理解您的代码。

其他眉毛提升者:

  • 永远不要使用原型,除非你知道完全他们的意思。
  • sleeping_sub不使用任何参数。
  • 您经常使用threads::running列表,而不考虑这是否真的正确。

您似乎只想立即运行 N 工作人员,但希望总共启动 M 工作人员。这是一种相当优雅的方式来实现它。主要思想是我们在线程之间有一个队列,刚刚完成的线程可以将它们的线程ID排入队列。然后将连接该线程。为了限制线程数,我们使用信号量:

use threads; use strict; use warnings;
use feature 'say';  # "say" works like "print", but appends newline.
use Thread::Queue;
use Thread::Semaphore;

my @pieces_of_work = 1..6;
my $num_threads = 3;
my $finished_threads = Thread::Queue->new;
my $semaphore = Thread::Semaphore->new($num_threads);

for my $task (@pieces_of_work) {
  $semaphore->down;  # wait for permission to launch a thread

  say "Starting a new thread...";

  # create a new thread in scalar context
  threads->new({ scalar => 1 }, sub {
    my $result = worker($task);                # run actual task
    $finished_threads->enqueue(threads->tid);  # report as joinable "in a second"
    $semaphore->up;                            # allow another thread to be launched
    return $result;
  });

  # maybe join some threads
  while (defined( my $thr_id = $finished_threads->dequeue_nb )) {
    join_thread($thr_id);
  }
}

# wait for all threads to be finished, by "down"ing the semaphore:
$semaphore->down for 1..$num_threads;
# end the finished thread ID queue:
$finished_threads->enqueue(undef);

# join any threads that are left:
while (defined( my $thr_id = $finished_threads->dequeue )) {
  join_thread($thr_id);
}

join_threadworker定义为

sub worker {
  my ($task) = @_;
  sleep rand 2; # sleep random amount of time
  return $task + rand; # return some number
}

sub join_thread {
  my ($tid) = @_;
  my $thr = threads->object($tid);
  my $result = $thr->join;
  say "Thread #$tid returned $result";
}

我们可以得到输出:

Starting a new thread...
Starting a new thread...
Starting a new thread...
Starting a new thread...
Thread #3 returned 3.05652608754778
Starting a new thread...
Thread #1 returned 1.64777186731541
Thread #2 returned 2.18426146087901
Starting a new thread...
Thread #4 returned 4.59414651998983
Thread #6 returned 6.99852684265667
Thread #5 returned 5.2316971836585

(顺序和返回值不确定)。

使用队列可以很容易地告诉哪个线程已经完成。信号量可以更容易地保护资源,或限制并行数量。

与繁忙的循环相比,这种模式的主要好处是使用的CPU少得多。这也缩短了一般执行时间。

虽然这是一个非常大的改进,但我们可以做得更好!产生线程很昂贵:这基本上是一个fork(),没有Unix系统上的所有写时复制优化。复制整个解释器,包括您已创建的所有变量,所有状态等。

因此,应谨慎使用线程,并尽早产生。我已经向您介绍了可以在线程之间传递值的队列。我们可以扩展它,以便一些工作线程不断从输入队列中拉出工作,并通过输出队列返回。现在的困难是让最后一个线程退出完成输出队列。

use threads; use strict; use warnings;
use feature 'say';
use Thread::Queue;
use Thread::Semaphore;

# define I/O queues
my $input_q  = Thread::Queue->new;
my $output_q = Thread::Queue->new;

# spawn the workers
my $num_threads = 3;
my $all_finished_s = Thread::Semaphore->new(1 - $num_threads); # a negative start value!
my @workers;
for (1 .. $num_threads) {
  push @workers, threads->new( { scalar => 1 }, sub {
    while (defined( my $task = $input_q->dequeue )) {
      my $result = worker($task);
      $output_q->enqueue([$task, $result]);
    }
    # we get here when the input queue is exhausted.
    $all_finished_s->up;
    # end the output queue if we are the last thread (the semaphore is > 0).
    if ($all_finished_s->down_nb) {
      $output_q->enqueue(undef);
    }
  });
}

# fill the input queue with tasks
my @pieces_of_work = 1 .. 6;
$input_q->enqueue($_) for @pieces_of_work;

# finish the input queue
$input_q->enqueue(undef) for 1 .. $num_threads;

# do something with the data
while (defined( my $result = $output_q->dequeue )) {
  my ($task, $answer) = @$result;
  say "Task $task produced $answer";
}

# join the workers:
$_->join for @workers;

如前所述定义worker,我们得到:

Task 1 produced 1.15207098293783
Task 4 produced 4.31247785766295
Task 5 produced 5.96967474718984
Task 6 produced 6.2695013168678
Task 2 produced 2.02545636412421
Task 3 produced 3.22281619053999

(打印完所有输出后,三个线程将会连接起来,因此输出会很无聊。)

当我们detach线程时,第二个解决方案变得更简单 - 主线程在所有线程退出之前不会退出,因为它正在侦听由最后一个线程完成的输入队列。 / p>