Question

我有一个perl脚本，它运行两个外部程序，一个依赖于另一个，用于一系列数据集。目前，我只为每个数据集一次执行此操作，通过第一个程序运行它，使用qx收集结果，并使用这些结果运行第二个程序。数据将添加到输出文件中，其中包含第二个程序的结果，每个数据集一个文件。我创建了一个简单的可重复示例，希望能够捕获我当前的方法：

#!/usr/bin/perl
#
# stackoverflow_q_7-7-2016.pl

use warnings;
use strict;

my @queries_list = (2, 4, 3, 1);

foreach my $query (@queries_list) {
    #Command meant to simulate the first, shorter process, and return a list of results for the next process
    my $cmd_1 = "sleep " . $query . "s; shuf -i 4-8 -n 3";
    print "Running program_1 on query $query...\n";
    my @results = qx($cmd_1);

    foreach (@results) {
        chomp $_;
        #Command meant to simulate a longer process whose input depends on program_1; the output I write to a separate file for each query
        my $cmd_2 = "sleep " . $_ . "s; fortune -s | head -c " . $_ * 5 . " >> $query.output";
        print "\tRunning program_2 on query $query with input param $_...\n";
        system($cmd_2);         }
}

由于第一个程序通常比第二个程序完成得更快，我认为通过在program_1同时运行上一个查询的同时继续通过program_1运行新查询来加速整个交易可能是可能的。加快速度是很好的，因为它目前需要花费数小时的处理才能完成。但是，我不知道该怎么做。 Parallel :: ForkManager之类的东西会有解决方案吗？或者在Perl中使用线程？

现在在我的实际代码中我做了一些错误处理并为program_2设置了超时 - 我使用fork，exec和$ SIG {ALRM}来做这个，但我真的不知道我在做什么。重要的是我仍然有能力这样做，否则program_2可能会卡住或不充分地报告它失败的原因。以下是错误处理代码的样子。我不认为它在可重复的例子中应该如此工作，但至少你会希望看到我正在尝试做什么。这是错误处理：

#!/usr/bin/perl
#
# stackoverflow_q_7-7-2016.pl

use warnings;
use strict;

my @queries_list = (2, 4, 3, 1);

foreach my $query (@queries_list) {
    #Command meant to simulate the first, shorter process, and return a list of results for the next process
    my $cmd_1 = "sleep " . $query . "s; shuf -i 4-15 -n 3";
    print "Running program_1 on query $query...\n";
    my @results = qx($cmd_1);

    foreach (@results) {
        chomp $_;
        #Command meant to simulate a longer process whose input depends on program_1; the output I write to a separate file for each query
        my $cmd_2 = "sleep " . $_ . "s; fortune -s | head -c " . $_ * 3 . " >> $query.output";
        print "\tRunning program_2 on query $query with input param $_...\n";

        my $childPid;
        eval {
            local $SIG{ALRM} = sub { die "Timed out" };
            alarm 10;
            if ($childPid = fork()) {
                wait();
            } else {
                exec($cmd_2);
            }
            alarm 0;
        };
        if ($? != 0) {
            my $exitCode = $? >> 8;
            print "Program_2 exited with error code $exitCode. Retry...\n";
        }
        if ($@ =~ /Timed out/) {
            print "\tProgram_2 timed out. Skipping...\n";
            kill 2, $childPid;
            wait;
        };
    }
}

感谢所有帮助。

Answer 1

一个解决方案：

use threads;

use Thread::Queue;  # 3.01+

sub job1 { ... }
sub job2 { ... }

{
   my $job1_request_queue = Thread::Queue->new();
   my $job2_request_queue = Thread::Queue->new();

   my $job1_thread = async {
      while (my $job = $job1_request_queue->dequeue()) {
         my $result = job1($job);
         $job2_request_queue->enqueue($result);
      }

      $job2_request_queue->end();
   };

  my $job2_thread = async {
      while (my $job = $job2_request_queue->dequeue()) {
         job2($job);
      }
   };

   $job1_request_queue->enqueue($_) for ...;

   $job1_request_queue->end();    
   $_->join() for $job1_thread, $job2_thread;
}

你甚至可以拥有多种/两种类型的工作者。

use threads;

use Thread::Queue;  # 3.01+

use constant NUM_JOB1_WORKERS => 1;
use constant NUM_JOB2_WORKERS => 3;

sub job1 { ... }
sub job2 { ... }

{
   my $job1_request_queue = Thread::Queue->new();
   my $job2_request_queue = Thread::Queue->new();

   my @job1_threads;
   for (1..NUM_JOB1_WORKERS) {
      push @job1_threads, async {
         while (my $job = $job1_request_queue->dequeue()) {
            my $result = job1($job);
            $job2_request_queue->enqueue($result);
         }
      };
   }

   my @job2_threads;
   for (1..NUM_JOB2_WORKERS) {
      push @job2_threads, async {
         while (my $job = $job2_request_queue->dequeue()) {
            job2($job);
         }
      };
   }

   $job1_request_queue->enqueue($_) for ...;

   $job1_request_queue->end();    
   $_->join() for @job1_threads;
   $job2_request_queue->end();
   $_->join() for @job2_threads;
}

使用IPC::Run代替qx添加超时。不需要信号。

Perl - 并行编程 - 运行两个外部程序

1 个答案: