Perl:并行运行递归作业

时间:2014-06-02 13:13:59

标签: perl recursion parallel-processing

我有一个递归函数,调用系统命令列出文件和目录。对于每个目录,它将再次调用自己。

此过程可能需要一段时间。这就是我想要运行并行作业的原因。

我正在研究ForkManager,但它不允许创建新的子叉。由于子流程的数量应限制在10,我想到的是一个“工人”。概念。有10名工人在等待工作被执行。

我的递归函数:

sub pullDataFromDbWithDirectory {
    my $_dir = $_[0];
    my @list = ();

    if ($itemCount lt $maxNumberOfItems) {
        my @retval = grep { /dir|file/ } map { s/^Dir\s+|^File\s+|\n//g; $_ } qx($omnidb -filesystem $filesystem  '$label'  -listdir '$_dir');

        foreach my $item (@retval) {
            $itemCount++;

            push(@list,$item) if $item =~ /^file/;

            if ($item =~ /^dir/) {
                my $subdir = "$_dir/$item";
                $data{$subdir} = ();

                if ($recursive) {
                    pullDataFromDbWithDirectory($subdir);
                }
            }
        }

        $data{$_dir} = \@list;
    }
}

非常感谢任何帮助。

更新

问题解决了。感谢您的投入。我修改了我的代码:

sub pullDataFromDbWithDirectory {
    my $_dir = $_[0];

    if ($itemCount <= $maxNumberOfItems) {
        my @retval = grep { /dir|file/ } map { s/^Dir\s+|^File\s+|\n//g; $_ } qx($omnidb -filesystem $filesystem  '$label'  -listdir '$_dir');

        foreach my $item (@retval) {
            $itemCount++;
            my $file = "$_dir/$item";
            push(@data,$file);

            if ($item =~ /^dir/) {
                $worker->enqueue($file);
                print "Add $file to queue\n" if $debug;
            }
        }
    }
}

sub doOperation () {
    my $ithread = threads->tid();
    while (my $folder = $worker->dequeue()) {
        print "Read $folder from queue\n" if $debug;
        pullDataFromDbWithDirectory($folder);
    }
}

my @threads = map threads->create(\&doOperation), 1 .. $maxNumberOfParallelJobs;
pullDataFromDbWithDirectory($directory);
$worker->enqueue((undef) x $maxNumberOfParallelJobs);
$_->join for @threads;

1 个答案:

答案 0 :(得分:2)

我会重写你的代码以使用适当的Perl模块,比如File::Find它会更有效。

use File::Find;
my %data;
find(\&wanted, @directories_to_search);
sub wanted {
  $data{$File::Find::dir} = $_;

}

对于paralel操作,我会像这样使用Thread :: Queue:

use strict;
use warnings;
use threads;

use threads;
use Thread::Queue;

my $q = Thread::Queue->new();    # A new empty queue
my %seen: shared;

# Worker thread
my @thrs = threads->create(\&doOperation ) for 1..5;#for 5 threads
add_file_to_q('/tmp/');
$q->enqueue('//_DONE_//') for @thrs;
$_->join() for @thrs;

sub add_file_to_q {
  my $dir = shift;
  my @files = `ls -1 $dir/`;chomp(@files);
  #add files to queue
  foreach my $f (@files){
    # Send work to the thread
    $q->enqueue($f);
    print "Pending items: "$q->pending()."\n";
  }
}



sub doOperation () {
    my $ithread = threads->tid() ;
    while (my $filename = $q->dequeue()) {
      # Do work on $item
      sleep(1) if ! defined $filename;
      return 1 if $filename eq '//_DONE_//';
      next if $seen{$filename};
      print "[id=$ithread]\t$filename\n";
      $seen{$filename} = 1;
      ### add files if it is a directory (check with symlinks, no file with //_DONE_// name!)
      add_file_to_q($filename) if -d $filename;
    }
    return 1;
}