即使编码设置为"Wide character in print"
,我也会收到utf8
警告:
use WWW::Mechanize;
$mech = new WWW::Mechanize;
$mech->get("http://www.ilo.org/dyn/triblex/triblexmain.fullText?p_lang=en&p_judgment_no=88&p_language_code=FR");
$mech->save_content("output.html", binmode => ':encoding(UTF-8)');
这个问题的解决方案是什么?
答案 0 :(得分:0)
似乎没有正确理解:encoding(UTF-8)
。
$mech->save_content($filename, binmode => ':raw:utf8');
应该有效
UPD:有关binmode指令(Perl IO层)的信息,请访问:http://perldoc.perl.org/PerlIO.html
答案 1 :(得分:0)
我无法重现您收到的警告。如果我运行你现在显示的代码,我根本就没有警告。但该网站似乎有问题。我写了这个程序,它使用LWP
并完全跳过WWW::Mechanize
use strict;
use warnings 'all';
binmode STDOUT, ':utf8';
use LWP;
use constant URL => 'http://www.ilo.org/dyn/triblex/triblexmain.fullText?p_lang=en&p_judgment_no=88&p_language_code=FR';
my $ua = LWP::UserAgent->new;
my $res = $ua->get(URL);
print $res->headers_as_string, "\n";
my $n = 0;
for my $chr ( unpack '(A1)*', $res->decoded_content ) {
my $ord = ord $chr;
printf "%4d: %04x\n", ++$n, $ord if $ord >= 0x7f;
}
响应具有完全合理的标题,但响应正文中所有带重音的非ASCII字符都是Unicode FFFD REPLACEMENT CHARACTER
这不应该导致宽字符打印错误,但肯定是错误的。请检查您是否拥有最新版本的LWP
和WWW::Mechanize
Connection: Keep-Alive
Date: Thu, 17 Mar 2016 23:15:12 GMT
Via: 1.1 www.ilo.org
Server: Oracle-Application-Server-10g/10.1.3.5.0 Oracle-HTTP-Server
Vary: Accept-Encoding,User-Agent
Content-Length: 10132
Content-Type: text/html; charset=UTF-8
Client-Date: Thu, 17 Mar 2016 23:15:12 GMT
Client-Peer: 193.134.195.36:80
Client-Response-Num: 1
Keep-Alive: timeout=5, max=98
Title: Jugement No 88 (TAOIT) - Tribunal administratif
1: fffd
2: fffd
3: fffd
4: fffd
5: fffd
6: fffd
7: fffd
8: fffd
9: fffd
10: fffd
... etc. to
316: fffd
317: fffd
318: fffd
319: fffd
320: fffd
321: fffd