即使使用utf8编码也可以进行宽字符打印

时间:2016-03-17 05:07:03

标签: perl utf-8 character-encoding mechanize

即使编码设置为"Wide character in print",我也会收到utf8警告:

use WWW::Mechanize;
$mech = new WWW::Mechanize;
$mech->get("http://www.ilo.org/dyn/triblex/triblexmain.fullText?p_lang=en&p_judgment_no=88&p_language_code=FR");
$mech->save_content("output.html", binmode => ':encoding(UTF-8)');

这个问题的解决方案是什么?

2 个答案:

答案 0 :(得分:0)

似乎没有正确理解:encoding(UTF-8)

$mech->save_content($filename, binmode => ':raw:utf8');

应该有效

UPD:有关binmode指令(Perl IO层)的信息,请访问:http://perldoc.perl.org/PerlIO.html

答案 1 :(得分:0)

我无法重现您收到的警告。如果我运行你现在显示的代码,我根本就没有警告。但该网站似乎有问题。我写了这个程序,它使用LWP并完全跳过WWW::Mechanize

use strict;
use warnings 'all';

binmode STDOUT, ':utf8';

use LWP;

use constant URL => 'http://www.ilo.org/dyn/triblex/triblexmain.fullText?p_lang=en&p_judgment_no=88&p_language_code=FR';

my $ua = LWP::UserAgent->new;

my $res = $ua->get(URL);

print $res->headers_as_string, "\n";

my $n = 0;
for my $chr ( unpack '(A1)*', $res->decoded_content ) {
    my $ord = ord $chr;
    printf "%4d: %04x\n", ++$n, $ord if $ord >= 0x7f;
}

响应具有完全合理的标题,但响应正文中所有带重音的非ASCII字符都是Unicode FFFD REPLACEMENT CHARACTER

这不应该导致宽字符打印错误,但肯定是错误的。请检查您是否拥有最新版本的LWPWWW::Mechanize

Connection: Keep-Alive
Date: Thu, 17 Mar 2016 23:15:12 GMT
Via: 1.1 www.ilo.org
Server: Oracle-Application-Server-10g/10.1.3.5.0 Oracle-HTTP-Server
Vary: Accept-Encoding,User-Agent
Content-Length: 10132
Content-Type: text/html; charset=UTF-8
Client-Date: Thu, 17 Mar 2016 23:15:12 GMT
Client-Peer: 193.134.195.36:80
Client-Response-Num: 1
Keep-Alive: timeout=5, max=98
Title: Jugement No 88 (TAOIT) - Tribunal administratif

   1: fffd
   2: fffd
   3: fffd
   4: fffd
   5: fffd
   6: fffd
   7: fffd
   8: fffd
   9: fffd
  10: fffd
... etc. to
 316: fffd
 317: fffd
 318: fffd
 319: fffd
 320: fffd
 321: fffd