我需要知道如何实现多线程

时间:2018-04-05 23:17:09

标签: multithreading perl

我正在尝试实现多线程功能perl脚本以提高速度 我正在尝试实现多线程功能perl脚本以提高速度

我需要知道如何为以下perl代码实现多线程

#!/usr/bin/perl

use if $^O eq "MSWin32", Win32::Console::ANSI;
use Getopt::Long;
use HTTP::Request;
use LWP::UserAgent;
use IO::Select;
use HTTP::Headers;
use IO::Socket;
use HTTP::Response;
use Term::ANSIColor;
use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);
use URI::URL;
use IO::Socket::INET;
use Data::Dumper;
use LWP::Simple;
use LWP;
use URI;
use JSON qw( decode_json encode_json );
use threads;

my $ua = LWP::UserAgent->new;
$ua    = LWP::UserAgent->new(keep_alive => 1);
$ua->agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31");


{
    chomp($site);
    push(@threads, threads->create (\&ask, \&baidu, $site));
    sleep(1) while(scalar threads->list(threads::running) >= 50);
}

eval {
    $_->join foreach @threads;
    @threads = ();
};

########### ASK ###########

sub ask {
    for ( $i = 0; $i < 20; $i += 1) {

        my $url = "https://www.ask.com/web?o=0&l=dir&qo=pagination&q=site%3A*.fb.com+-www.fb.com&qsrc=998&page=$i";

        my $request  = $ua->get($url);
        my $response = $request->content;

        while( $response =~ m/((https?):\/\/([^"\>]*))/g ) {

            my $link = $1;

            my $site = URI->new($link)->host;
            if ( $site =~ /$s/ ) {
                if ( $site !~ /</ ) {   
                    print "ask: $site\n";
                }
            } 
        }
    } 
}


########### Baidu ###########

sub baidu {
    for ( my $ii = 10; $ii <= 760; $ii += 10 ) {

        my $url = "https://www.baidu.com/s?pn=$ii&wd=site:fb.com&oq=site:fb.com";


        my $request  = $ua->get($url);
        my $response = $request->content;

        while ( $response =~ m/(style="text-decoration:none;">([^\/]*))/g ) {
            my $site = $1;
            $site =~ s/style="text-decoration:none;">//g;
            if ( $site =~ /$s/ ) {
                print "baidu: $site\n";
            }
        }
    }
}

如果运行此代码,我只从Ask.com获得结果。 我如何解决这个问题并感谢所有人?

C:\Users\USER\Desktop>k.pl -d fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com    ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com

1 个答案:

答案 0 :(得分:6)

好的,首先关闭 - 你在这里做了一些看起来非常狡猾的事情,我建议你需要退后一步检查你的代码。它看起来有点“货物崇拜”。感谢像:

use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);

或者:

my $ua = LWP::UserAgent->new;
$ua    = LWP::UserAgent->new(keep_alive => 1);

...您正在创建新的LWP::UserAgent实例,然后......创建另一个具有不同参数的实例。

由于您没有包含最重要的use项,因此您也遇到了大量错误:

use strict;
use warnings qw ( all );

首先打开它们,然后修复错误。

但是这里举例:

push(@threads, threads->create (\&ask, \&baidu, $site));

您认为这条线应该怎么做?因为这里实际发生的是你尝试调用ask sub,然后将代码引用的参数传递给baidu sub,并传递一个字符串$site - 这是在代码中的这一点未定义。但这是学术界的,因为你从来没有在你的子程序中读过它们。

因此,您的代码无法正常工作并不奇怪 - 这是无稽之谈。

但除此之外 - perl的线程模型经常被误解。它不像你在其他编程语言中想到的轻量级线程 - 实际上它是相当重量级的。

您每次迭代都会创建并生成一个线程,这也不是很有效。

您真正想要做的是使用Thread::Queue

产生一小部分工人&#39;每个任务的线程,让它们从队列中读取,并单独完成它们的工作。

end队列完成后,让线程退出并由主进程获取。

回答中的内容:Perl daemonize with child daemons

...但是你确定没有一个模块可以做你想要的吗?