WGET,Cookies和302重定向

时间:2010-12-19 02:27:35

标签: command-line cookies centos wget

我想使用WGET从某个网站的某个部分下载一些图片。本网站受密码保护。我已成功登录并保存cookie。但是,由于302重定向,我仍然无法下载图片。任何人都可以帮我看看吗?非常感谢。

wget --load-cookies=examplecookies  http://members.example.com/membersarea/0004.jpg
--2010-12-18 18:58:50--  http://members.example.com/membersarea/0004.jpg
Resolving members.example.com... 12.34.56.78
Connecting to members.example.com|12.34.56.78|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /login.aspx?ReturnUrl=%2fmembersarea%2f0004.jpg [following]
--2010-12-18 18:58:50--  http://members.example.com/login.aspx?ReturnUrl=%2fmembersarea%2f0004.jpg
Reusing existing connection to members.example.com:80.
HTTP request sent, awaiting response... 302 Found
Location: /membersarea/default.aspx [following]
--2010-12-18 18:58:50--  http://members.example.com/membersarea/default.aspx
Reusing existing connection to members.example.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 61898 (60K) [text/html]
Saving to: `default.aspx'

100%[===================================================================================>] 61,898      --.-K/s   in 0.1s

2010-12-18 18:58:51 (572 KB/s) - `default.aspx' saved [61898/61898]

default.aspx是membersarea的首页,这意味着我已成功登录。

我做了一些谷歌搜索,我添加了--user-agent="Mozilla/4.0",但它仍然无法正常工作:

wget --user-agent="Mozilla/4.0" --load-cookies=examplecookies  http://members.example.com/membersarea/0004.jpg

结果是一样的。

非常感谢!

1 个答案:

答案 0 :(得分:3)

我以前总是遇到wget和cookies的问题(尝试让wget使用我的Mozilla cookie等等)所以我转而使用Perl库WWW::Mechanize。它可以为您处理cookie以及您期望从浏览器中获得的所有常见内容,例如302处理和历史记录。

一个登录网站的简单示例,抓取所有JPG并点击“下一页”链接进行分页:

use warnings;
use strict;
use WWW::Mechanize;
use File::Slurp;

my $mech = WWW::Mechanize->new;
$mech->get('http://example.com/login') || die;
$mech->submit_form( form_name => 'login_form',
                    fields => { username => 'me',
                                password => 'secret' } ) || die;

while (1) {
   for my $link ($mech->links) {
      my $url = $link->url;
      if ($url =~ /(image_\d+\.jpg)\z/) {
         my $file = $1;
         $mech->get($url);
         File::Slurp::write_file($file, $mech->content);
         $mech->back; # like the browser back button                                
      }
   }
   # look at next page, if any                                                      
   my $result = $mech->follow_link(text_regex => qr/Next/);
   if (!$result) {
      last;
   }
}