继续获取部分下载的文件

时间:2011-07-25 08:58:16

标签: perl http download wget

是否有一个Perl工具,其行为类似于wget --continue并且能够继续获取部分下载的文件?

3 个答案:

答案 0 :(得分:2)

AnyEvent::HTTP的文档包含使用HTTP 1.1恢复下载的能力的代码。我从来没用过它,所以我不能评论它的适用性。

显然,该示例希望您已经知道如何使用AnyEvent,当然,我不知道。您需要修改代码以使其具有预期已存在的事件循环:

#!/usr/bin/perl

use strict;
use warnings;

use AnyEvent::HTTP;

my $url  = "http://localhost/foo.txt";
my $file = "foo.txt";

sub download {
    my ($url, $file, $cb) = @_;

    open my $fh, "+>>:raw", $file
        or die "could not open $file: $!";

    my %hdr;
    my $ofs = 0;

    if (stat $fh and $ofs = -s _) {
        $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date((stat _)[9]);
        $hdr{"range"} = "bytes=$ofs-";
    }

    http_get $url, (
        headers   => \%hdr,
        on_header => sub {
            my ($hdr) = @_;

            if ($hdr->{Status} == 200 && $ofs) {
                # resume failed
                truncate $fh, $ofs = 0;
            }

            sysseek $fh, $ofs, 0;

            return 1;
        },
        on_body   => sub {
            my ($data, $hdr) = @_;

            if ($hdr->{Status} =~ /^2/) {
                length $data == syswrite $fh, $data
                    or return; # abort on write errors
            }

            return 1;
        },
        sub {
            my (undef, $hdr) = @_;

            my $status = $hdr->{Status};

            if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
                utime $fh, $time, $time;
            }

            if ($status == 200 || $status == 206 || $status == 416) {
                # download ok || resume ok || file already fully downloaded
                $cb->(1, $hdr);

            } elsif ($status == 412) {
                # file has changed while resuming, delete and retry
                unlink $file;
                $cb->(0, $hdr);

            } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
                # retry later
                $cb->(0, $hdr);

            } else {
                $cb->(undef, $hdr);
            }
        }
    );
}

my $quit = AnyEvent->condvar; #create a handle to exit the loop

download $url, $file, sub {
    if ($_[0]) {
        print "OK!\n";
    } elsif (defined $_[0]) {
        print "please retry later\n";
    } else {
        print "ERROR\n";
    }
    $quit->send; #quit the loop
};

$quit->recv; #start the loop

使其成功的关键是$quit条件变量:

my $quit = AnyEvent->condvar; #handle to exit the loop
.
.
.
$quit->recv;

这会设置一个事件循环。如果没有事件循环,程序将在调用http_get之前有机会除了创建文件之外执行任何操作。要退出事件循环,我们会在$quit->send函数的回调中调用download

答案 1 :(得分:1)

我在Google的帮助下找到了这一点(需要特定的搜索字词,而不是说GIYF)。 Link to Google Translate

他们的示例代码是

use strict;
use LWP::UserAgent;

my $u = "http://www.mangafox.com/media/manga.banner.png";
my $f = "tmp.jpg"; 

my $ua = LWP::UserAgent->new();

## Uncomment for test
# unlink $f;
# system("cp tmp.jpg tmp0.jpg");
# system( "head -c 10000 tmp0.jpg > tmp.jpg" );

download($u, $f);

sub download  {
    my ($url, $file) = @_;

    my ($tries, @parameters, $FD);

    @parameters = ( $url,                  
                    ":content_cb" => sub { my ($chunk) = @_;
                                           print $FD $chunk;
                                         } 
                  ); 
    $tries = 4;

    while ( $tries ) {

        open($FD, ">>$file") || die "ERROR: $!";      

        my $bytes = -s $file;        
        if ( $bytes > 0 ) { push(@parameters, "Range" => "bytes=$bytes-" ) }

        my $res =$ua->get( @parameters );
        print $res->status_line . "\n";
        close $FD;

        # 416 Requested Range Not Satisfiable
        # (file already fully downloaded)

        if ( $res->is_success || $res->code == 416 ) { return }        
        $tries --;
    }

    die "ERROR: download $url";   
}

# Test with ImageMagick
system("identify tmp.jpg");

我不是说这有效或适合你。只是我发现的东西。使用风险自负。

答案 2 :(得分:1)

尝试过(2次)并且有效。

#!/usr/local/bin/perl
use warnings;
use 5.014;
use utf8;
use LWP::UserAgent;
use File::Basename;

my $url = 'http://server/somelargefile';
my $file = basename $url;
my $ua = LWP::UserAgent->new( show_progress => 1 );

open my $fh, '>>:raw', $file or die $!; 
my $bytes = -s $file;
my $res;       
if ( $bytes ) { 
    say "resume download: $file ($bytes)";
    $res = $ua->get( 
        $url,
        'Range' => "bytes=$bytes-",
        ':content_cb' => sub { my ( $chunk ) = @_; print $fh $chunk; }
    );
} else {
    say "start download";
    $res = $ua->get( 
        $url,
        ':content_cb' => sub { my ( $chunk ) = @_; print $fh $chunk; } 
    );
}
close $fh;

my $status = $res->status_line;
if ( $status =~ /^(200|206|416)/ ) {
    say "OK" if $status =~ /^20[06]/;
    say "$file already downloaded" if $status =~ /^416/;
} else {
    say $status;
}