在Google云端硬盘文档上使用LWP :: UserAgent的内容处理

时间:2015-10-19 14:53:05

标签: perl http-headers wget lwp content-disposition

我正在尝试使用从服务器收到的实际文件名保存Google云端硬盘共享文件(内容处理):

我试图分析标题:

use strict;
use warnings;
use Data::Dumper;
use LWP::UserAgent qw( );
my $str = 'https://drive.google.com/file/d/0B6vqTWO9kmdmdzk5ejhDSXgzMDg/view?usp=sharing';
$str =~ /file\/d\/(\w+)/;
my $url = 'https://drive.google.com/uc?export=download&id='.$1;
my $ua = LWP::UserAgent->new();
my $response = $ua->head($url)->{'_headers'};
print Dumper( $response );

我明白了:

$VAR1 = bless( {
'client-ssl-cert-subject' => '/C=US/ST=California/L=Mountain View/O=Google Inc/CN=*.googleusercontent.com',
'connection' => 'close',
'date' => 'Mon, 19 Oct 2015 14:45:45 GMT',
'content-type' => 'text/html; charset=UTF-8',
'x-guploader-uploadid' => 'AEnB2UrQJIoJUIIhWnKz9HAlW_2XKApLe_0IDMZjS0gGQOMdRaF68Od2xsxssp7mBdQP9kNrjvDueWUP5pSa1eHbprSjbPvfbA',
'alternate-protocol' => '443:quic,p=1',
'expires' => 'Mon, 19 Oct 2015 14:45:45 GMT',
'::std_case' => {
   'client-ssl-socket-class' => 'Client-SSL-Socket-Class',
   'client-ssl-cert-subject' => 'Client-SSL-Cert-Subject',
   'client-ssl-cipher' => 'Client-SSL-Cipher',
   'client-peer' => 'Client-Peer',
   'x-guploader-uploadid' => 'X-GUploader-UploadID',
   'alternate-protocol' => 'Alternate-Protocol',
   'alt-svc' => 'Alt-Svc',
   'client-ssl-cert-issuer' => 'Client-SSL-Cert-Issuer',
   'client-date' => 'Client-Date',
   'client-response-num' => 'Client-Response-Num'
 },
'client-ssl-cert-issuer' => '/C=US/O=Google Inc/CN=Google Internet Authority G2',
'server' => 'UploadServer',
'client-date' => 'Mon, 19 Oct 2015 14:45:55 GMT',
'client-ssl-socket-class' => 'IO::Socket::SSL',
'client-ssl-cipher' => 'ECDHE-ECDSA-AES128-GCM-SHA256',
'client-peer' => '2a00:1450:400c:c08::84:443',
'alt-svc' => 'quic=":443"; p="1"; ma=604800',
'cache-control' => 'private, max-age=0',
'client-response-num' => 1
}, 'HTTP::Headers' );

我希望在上面找到content-disposition标头。 另一方面,wget正确地给出了文件名:

#wget --content-disposition "https://drive.google.com/uc?export=download&id=0B6vqTWO9kmdmdzk5ejhDSXgzMDg"
--2015-10-19 20:21:37--  https://drive.google.com/uc?export=download&id=0B6vqTWO9kmdmdzk5ejhDSXgzMDg
Resolving drive.google.com (drive.google.com)... 2a00:1450:400c:c04::64, 74.125.206.139, 74.125.206.101, ...
Connecting to drive.google.com (drive.google.com)|2a00:1450:400c:c04::64|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download [following]
Warning: wildcards not supported in HTTP.
--2015-10-19 20:21:37--  https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download
Resolving doc-0s-2s-docs.googleusercontent.com (doc-0s-2s-docs.googleusercontent.com)... 2a00:1450:400c:c08::84, 74.125.140.132
Connecting to doc-0s-2s-docs.googleusercontent.com (doc-0s-2s-docs.googleusercontent.com)|2a00:1450:400c:c08::84|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16 [text/plain]
Saving to: ‘testdoc.txttestdoc.txt’

testdoc.txttestdoc.txt                              100%[====================================================================================================================>]      16  --.-KB/s   in 0s

2015-10-19 20:21:38 (550 KB/s) - ‘testdoc.txttestdoc.txt’ saved [16/16]

如何使用perl从服务器获取正确的文件名?

2 个答案:

答案 0 :(得分:3)

不同之处在于你没有使用wget进行HEAD。如果您在Perl代码中查看响应的状态,那么

503 Service Unavailable

可以引用很多东西,但在这种情况下意味着不支持HEAD。更改GET并且一切正常

use strict;
use warnings;

use feature 'say';

use LWP::UserAgent;

my $str = 'https://drive.google.com/file/d/0B6vqTWO9kmdmdzk5ejhDSXgzMDg/view?usp=sharing';
die unless $str =~ m{file/d/(\w+)};

my $url = "https://drive.google.com/uc?export=download&id=$1";

my $ua = LWP::UserAgent->new;

my $resp = $ua->get($url);

say $resp->status_line;

say $resp->header('Content-Disposition');

输出

200 OK
attachment;filename="testdoc.txt";filename*=UTF-8''testdoc.txt

答案 1 :(得分:0)

差异的原因是:

HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download [following]

您的初始网址已被重定向 - 因此,您要获取标题的内容不是正在下载的文件wget。尝试阅读status_line你的头部请求,看看你得到了什么。