Perl LWP :: UserAgent在给定服务器上看似随机地挂起120秒

时间:2012-09-12 15:31:39

标签: perl www-mechanize lwp lwp-useragent

我发现使用给定的https网站处理WWW::Mechanize中的response_dataresponse_done事件之间会有大约120秒的延迟。我使用普通的网络浏览器进行检查,并没有遇到这种缓慢的情况,因此我怀疑我必须做错事。

以下是我为追踪事件所做的工作(出于某种原因use LWP::Debug qw(+)没有做任何事情):

use WWW::Mechanize;
use Time::HiRes qw(gettimeofday);
use IO::Handle;

my $mech = WWW::Mechanize->new(
  timeout     => 3,
  autocheck   => 1,       # check success of each query
  stack_depth => 0,       # no keeping history
  keep_alive  => 50,      # connection pool
);

$mech->agent_alias( 'Windows IE 6' );
open my $debugfile, '>traffic.txt';
$debugfile->autoflush(1);

$mech->add_handler( request_send => sub {
    my $cur_time = gettimeofday();
    my $req = shift;
    print $debugfile "\n$cur_time === BEGIN HTTP REQUEST ===\n";
    print $debugfile $req->dump();
    print $debugfile "\n$cur_time ===   END HTTP REQUEST ===\n";
    return
  }
);
$mech->add_handler( response_header => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    print $debugfile "\n$cur_time === GOT RESPONSE HDRS ===\n";
    print $debugfile $res->dump();
    return
  }
);
$mech->add_handler( response_data => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    my $content_length = length($res->content);
    print $debugfile "$cur_time === Got response data chunk resp size = $content_length ===\n";
    return
  }
);
$mech->add_handler( response_done => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    print $debugfile "\n$cur_time === BEGIN HTTP RESPONSE ===\n";
    print $debugfile $res->dump();
    print $debugfile "\n===   END HTTP RESPONSE ===\n";
    return
  }
);

以下是跟踪的摘录(URL和cookie被混淆):

1347463214.24724 === BEGIN HTTP REQUEST ===
GET https://...
Accept-Encoding: gzip
Referer: https://...
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Cookie: ...
Cookie2: $Version="1"

(no content)

1347463214.24724 ===   END HTTP REQUEST ===

1347463216.13134 === GOT RESPONSE HDRS ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL

(no content)
1347463216.48305 === Got response data chunk resp size = 4096 ===

1347463337.98131 === BEGIN HTTP RESPONSE ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Date: Wed, 12 Sep 2012 15:22:17 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL

PK\3\4\24\0\6\0\10\0\0\0!\0\x88\xBC\21Xi\2\0\0\x84\22\0\0\23\0\10\2[Content_Types].xml \xA2...
(+ 377294 more bytes not shown)

===   END HTTP RESPONSE ===

在“获得响应数据块”和“BEGIN HTTP RESPONSE”消息期间,您可以看到121.5秒的间隙。我觉得有时LWP::UserAgent在收到全部数据后会挂起两分钟。

你有什么线索可以来自哪里?

编辑这是Wireshark的截图:我在120秒后收到FIN / ACK消息......

Wireshark Excerpt

由于

4 个答案:

答案 0 :(得分:3)

我认为您的交易实际上可能需要很长时间。 LWP::UserAgent的文档说明了这个

  

[response_data handler]需要返回一个要调用的TRUE值   再次为同一请求的后续块

所以,因为你的处理程序什么都不返回,所以你只追踪第一个返回的数据包

根据您的输出,前4 KB数据在2.2秒内到达,或大约每秒2KB。整个数据长度为369KB,因此您可能需要再接收92个数据包,而每秒2KB则需要3分钟才能传输。你会在两分钟内得到答复,所以我认为你的时间是合理的

答案 1 :(得分:3)

感谢Borodin的回答,我找到了解决问题的方法:

我用这种方式修改了response_data事件处理程序:

if($res->header('Content-Length') == length($res->content)) {
    die "OK"; # Got whole data, not waiting for server to end the communication channel.
}
return 1; # In other cases make sure the handler is called for subsequent chunks

然后如果X-Died标头等于OK,则忽略调用者中的错误。

答案 2 :(得分:2)

我知道现在已经很老了,但最近我遇到了同样的问题。它仅在未加密的HTTPS响应(包括标头)的大小恰好为1024字节时发生。 Benoit似乎有4096字节的响应,因此1024的倍数可能很重要。我没有控制服务器,因此我无法生成任意长度的测试响应,也无法在任何其他服务器上重现该问题。但是1024字节的出现是可重复的。

环顾LWP代码(v6.05),我发现要求sysread一次读取1024个字节。所以,它第一次返回所有1024个字节。然后立即调用第二次,而不是返回0,表示没有更多数据,它返回undef,表示错误,并将errno设置为EAGAIN,表示有更多数据,但它是'还没有。这导致套接字上的选择,因为没有更多的数据而挂起。超时需要120秒,之后返回我们所拥有的数据,这恰好是正确的结果。因此,我们没有错误,只是很长的延迟。

我没有足够方便使用Benoit的解决方案。相反,我的解决方法是扩展HTTPS处理代码以检查上述情况并返回0而不是undef:

package LWP::Protocol::https::Socket;

sub sysread {
    my $self = shift;
    my $result = $self->SUPER::sysread(@_);
    # If we get undef back then some error occurred. If it's EAGAIN
    # then that ought to mean that there is more data to read but
    # it's not available yet. We suspect the error may be false.
    # $_[2] is the offset, so if it's defined and non-zero we have
    # some data in the buffer.
    # $_[0] is the buffer, so check it for an entire HTTP response,
    # including the headers and the body. If the length specified
    # by Content-Length is exactly the length of the body we have in
    # the buffer, then take that as being complete and return a length
    # here instead. Since it's unlikely that anything was read, the
    # buffer will not have increased in size and the result will be zero
    # (which was the expected result anyway).
    if (!defined($result) &&
        $!{EAGAIN} &&
        $_[2] &&
        $_[0] =~ /^HTTP\/\d+\.\d+\s+\d+\s+.*\s+content-length\s*:\s*(\d+).*?\r?\n\r?\n(.*)$/si &&
        length($2) == $1) {
            return length($_[0]) - $_[2]; # bufferlen - offset
    }
    return $result;
}

答案 3 :(得分:1)

艾伦, 我在我的系统上收到了相同的行为。 对于内容长度1024,2048,3072字节等

此问题的解决方案是将Net :: HTTP升级到6.09及更高版本