Question

我找到了使用线程进行http请求的答案here

只想在接受的答案中询问：

for my $url ('http://www.google.com/', 'http://www.perl.org/') {
   push @threads, async { $ua->get($url) };
}

如果我有超过20K的url要获取，这种方法是推送到这个for循环中的数组@threads吗？或者我应该重组它来处理超过20K的列表项？我怎么能这样做，它不会崩溃我的系统？感谢

Answer 1

这是推出的不少主题。它可能低于thread limit for your system，因此它取决于您可以为该作业提供多少资源。

如果您更愿意使用工作池，Parallel:ForkManager是一个受欢迎的模块。

该模块的文档为大规模下载程序提供了此示例：

use LWP::Simple;
use Parallel::ForkManager;

...

@links=(
  ["http://www.foo.bar/rulez.data","rulez_data.txt"],
  ["http://new.host/more_data.doc","more_data.doc"],
  ...
);

...

# Max 30 processes for parallel download
my $pm = Parallel::ForkManager->new(30);

foreach my $linkarray (@links) {
  $pm->start and next; # do the fork

  my ($link,$fn) = @$linkarray;
  warn "Cannot get $fn from $link"
    if getstore($link,$fn) != RC_OK;

  $pm->finish; # do the exit in the child process
}
$pm->wait_all_children;

LWP::UserAgent没有LWP::Simple提供的getstore子版，但它的mirror方法行为相似。

Answer 2

您也可以轻松地使用线程执行工作池。

use threads;

use Thread::Queue 3.01 qw( );

use constant NUM_WORKERS => 30;

sub process {
   my ($url) = @_;
   ... $ua->get($url) ...
}

my $q = Thread::Queue->new();

my @workers;
for (1..NUM_WORKERS) {
   async {
      while (my $job = $q->dequeue()) {
         process($job);
      }
   };
}

$q->enqueue($_) for @urls;
$q->end();

$_->join() for threads->list();

rutter's Fork ::已发布的ParallelManager解决方案共创造了20,000名员工。这创造了30个。

尽管如此，Net::Curl::Multi在这方面要好得多。

Answer 3

我建议让POE来处理这类事情。

http://poe.perl.org/?POE_Cookbook

具体地

http://poe.perl.org/?POE_Cookbook/Web_Client

使用线程的Perl并行请求

3 个答案: