我想使用WGET从某个网站的某个部分下载一些图片。本网站受密码保护。我已成功登录并保存cookie。但是,由于302重定向,我仍然无法下载图片。任何人都可以帮我看看吗?非常感谢。
wget --load-cookies=examplecookies http://members.example.com/membersarea/0004.jpg
--2010-12-18 18:58:50-- http://members.example.com/membersarea/0004.jpg
Resolving members.example.com... 12.34.56.78
Connecting to members.example.com|12.34.56.78|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /login.aspx?ReturnUrl=%2fmembersarea%2f0004.jpg [following]
--2010-12-18 18:58:50-- http://members.example.com/login.aspx?ReturnUrl=%2fmembersarea%2f0004.jpg
Reusing existing connection to members.example.com:80.
HTTP request sent, awaiting response... 302 Found
Location: /membersarea/default.aspx [following]
--2010-12-18 18:58:50-- http://members.example.com/membersarea/default.aspx
Reusing existing connection to members.example.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 61898 (60K) [text/html]
Saving to: `default.aspx'
100%[===================================================================================>] 61,898 --.-K/s in 0.1s
2010-12-18 18:58:51 (572 KB/s) - `default.aspx' saved [61898/61898]
default.aspx
是membersarea的首页,这意味着我已成功登录。
我做了一些谷歌搜索,我添加了--user-agent="Mozilla/4.0"
,但它仍然无法正常工作:
wget --user-agent="Mozilla/4.0" --load-cookies=examplecookies http://members.example.com/membersarea/0004.jpg
结果是一样的。
非常感谢!
答案 0 :(得分:3)
我以前总是遇到wget和cookies的问题(尝试让wget使用我的Mozilla cookie等等)所以我转而使用Perl库WWW::Mechanize。它可以为您处理cookie以及您期望从浏览器中获得的所有常见内容,例如302处理和历史记录。
一个登录网站的简单示例,抓取所有JPG并点击“下一页”链接进行分页:
use warnings;
use strict;
use WWW::Mechanize;
use File::Slurp;
my $mech = WWW::Mechanize->new;
$mech->get('http://example.com/login') || die;
$mech->submit_form( form_name => 'login_form',
fields => { username => 'me',
password => 'secret' } ) || die;
while (1) {
for my $link ($mech->links) {
my $url = $link->url;
if ($url =~ /(image_\d+\.jpg)\z/) {
my $file = $1;
$mech->get($url);
File::Slurp::write_file($file, $mech->content);
$mech->back; # like the browser back button
}
}
# look at next page, if any
my $result = $mech->follow_link(text_regex => qr/Next/);
if (!$result) {
last;
}
}