Question

获取以下网址后，为什么WWW::Mechanize有空白内容？使用浏览器或curl检索完整的HTML页面。

use WWW::Mechanize;
$mech = new WWW::Mechanize;
$mech->get("http://www.belizejudiciary.org/web/judgements2/");
print $mech->content  # prints nothing

以下是回复的转储：

HTTP/1.1 200 OK
Connection: close
Date: Fri, 10 Feb 2017 00:51:47 GMT
Server: Apache/2.4
Content-Type: text/html; charset=UTF-8
Client-Aborted: die
Client-Date: Fri, 10 Feb 2017 00:51:48 GMT
Client-Peer: 98.129.229.64:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Link: <http://www.belizejudiciary.org/web/wp-json/>; rel="https://api.w.org/"
Link: <http://www.belizejudiciary.org/web/?p=468>; rel=shortlink
Set-Cookie: X-Mapping-hepadkon=FAB86566672CEB74D66B2818CA030616; path=/
X-Died: Illegal field name 'X-Meta-Twitter:title' at /usr/local/lib/perl5/site_perl/5.16.3/sun4-solaris/HTML/HeadParser.pm line 207.
X-Pingback: http://www.belizejudiciary.org/web/xmlrpc.php

我安装了3.70版HTML :: Parser。

Answer 1

您的转储显示解析响应时出错：

X-Died：非法字段名称＆＃39; X-Meta-Twitter：标题＆＃39;在/usr/local/lib/perl5/site_perl/5.16.3/sun4-solaris/HTML/HeadParser.pm第207行。

这是由HTML :: HeadParser中的bug引起的：

<meta>标签可以包含带冒号的名称属性，这完全有效。但是HTML :: HeadParser然后尝试使用HTTP :: Headers将它们注册为X-Meta-<name>标头。较新版本的HTTP :: Headers（自6.05起）对标题进行了更严格的检查，如果它们包含冒号则会拒绝它们。

这已在HTML-Parser发行版的3.71版本中修复，因此您应该升级。

为什么WWW :: Mechanize失败了＆＃34; X-Died：非法字段名称＆＃39; X-Meta-Twitter：标题＆＃39;＆＃34;？

1 个答案: