我正在尝试实现多线程功能perl脚本以提高速度 我正在尝试实现多线程功能perl脚本以提高速度
我需要知道如何为以下perl代码实现多线程
#!/usr/bin/perl
use if $^O eq "MSWin32", Win32::Console::ANSI;
use Getopt::Long;
use HTTP::Request;
use LWP::UserAgent;
use IO::Select;
use HTTP::Headers;
use IO::Socket;
use HTTP::Response;
use Term::ANSIColor;
use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);
use URI::URL;
use IO::Socket::INET;
use Data::Dumper;
use LWP::Simple;
use LWP;
use URI;
use JSON qw( decode_json encode_json );
use threads;
my $ua = LWP::UserAgent->new;
$ua = LWP::UserAgent->new(keep_alive => 1);
$ua->agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31");
{
chomp($site);
push(@threads, threads->create (\&ask, \&baidu, $site));
sleep(1) while(scalar threads->list(threads::running) >= 50);
}
eval {
$_->join foreach @threads;
@threads = ();
};
########### ASK ###########
sub ask {
for ( $i = 0; $i < 20; $i += 1) {
my $url = "https://www.ask.com/web?o=0&l=dir&qo=pagination&q=site%3A*.fb.com+-www.fb.com&qsrc=998&page=$i";
my $request = $ua->get($url);
my $response = $request->content;
while( $response =~ m/((https?):\/\/([^"\>]*))/g ) {
my $link = $1;
my $site = URI->new($link)->host;
if ( $site =~ /$s/ ) {
if ( $site !~ /</ ) {
print "ask: $site\n";
}
}
}
}
}
########### Baidu ###########
sub baidu {
for ( my $ii = 10; $ii <= 760; $ii += 10 ) {
my $url = "https://www.baidu.com/s?pn=$ii&wd=site:fb.com&oq=site:fb.com";
my $request = $ua->get($url);
my $response = $request->content;
while ( $response =~ m/(style="text-decoration:none;">([^\/]*))/g ) {
my $site = $1;
$site =~ s/style="text-decoration:none;">//g;
if ( $site =~ /$s/ ) {
print "baidu: $site\n";
}
}
}
}
如果运行此代码,我只从Ask.com
获得结果。
我如何解决这个问题并感谢所有人?
C:\Users\USER\Desktop>k.pl -d fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com
答案 0 :(得分:6)
好的,首先关闭 - 你在这里做了一些看起来非常狡猾的事情,我建议你需要退后一步检查你的代码。它看起来有点“货物崇拜”。感谢像:
use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);
或者:
my $ua = LWP::UserAgent->new;
$ua = LWP::UserAgent->new(keep_alive => 1);
...您正在创建新的LWP::UserAgent
实例,然后......创建另一个具有不同参数的实例。
由于您没有包含最重要的use
项,因此您也遇到了大量错误:
use strict;
use warnings qw ( all );
首先打开它们,然后修复错误。
但是这里举例:
push(@threads, threads->create (\&ask, \&baidu, $site));
您认为这条线应该怎么做?因为这里实际发生的是你尝试调用ask
sub,然后将代码引用的参数传递给baidu
sub,并传递一个字符串$site
- 这是在代码中的这一点未定义。但这是学术界的,因为你从来没有在你的子程序中读过它们。
因此,您的代码无法正常工作并不奇怪 - 这是无稽之谈。
但除此之外 - perl
的线程模型经常被误解。它不像你在其他编程语言中想到的轻量级线程 - 实际上它是相当重量级的。
您每次迭代都会创建并生成一个线程,这也不是很有效。
您真正想要做的是使用Thread::Queue
。
产生一小部分工人&#39;每个任务的线程,让它们从队列中读取,并单独完成它们的工作。
end
队列完成后,让线程退出并由主进程获取。
回答中的内容:Perl daemonize with child daemons
...但是你确定没有一个模块可以做你想要的吗?