如何使用Perl访问Google学术搜索

时间:2015-02-10 20:05:51

标签: perl cgi www-mechanize lwp

我正在使用以下代码尝试从我的网站搜索Google学术搜索,它会工作一两次,然后我收到错误“错误获取http://scholar.google.com:无法连接到scholar.google。 com:80(权限被拒绝)“ - 我正在使用的代码如下:

use strict;
use WWW::Mechanize;
my $browser = WWW::Mechanize->new();
$browser->get('http://scholar.google.com');
$browser->form_name('f');
$browser->field('q','PCR');
$browser->submit();
print $browser->content();

非常感谢任何提示或建议

1 个答案:

答案 0 :(得分:1)

您的代码很好,但Google学术搜索决定不允许LWP等“机器人”访问,有关详细信息,请参阅perlmonks/461130

编辑:我通过在标题中传递用户代理和cookie ID找到了解决方案:

use HTTP::Request;
use HTTP::Cookies;
use LWP::UserAgent;

# randomize cookie id
use Digest::MD5 qw(md5_hex);
my $googleid = md5_hex(rand());

# escape query string
use URI::Escape;
my $query= uri_escape('search string');

# create request
my $request = HTTP::Request->new(GET => 'http://scholar.google.com/scholar?q='.$query);

# disguise as Mozilla
my $ua = LWP::UserAgent->new;
$ua->agent('Mozilla/5.0');

# use random id for Cookie
my $cookies = HTTP::Cookies->new();
$cookies->set_cookie(0,'GSP', 'ID='.$googleid,'/','scholar.google.com');
$ua->cookie_jar($cookies);

# submit request
$response = $ua->request($request);
if($response->is_success){
    print $response->code;
    my $text = $response->decoded_content;
    # do something
}