我正在开发一个perl + Mojolicious Web应用程序,我的前端使用charset "a"
在"été"
参数(utf-8
)中发送包含重音的POST查询,因为我可以窥探在chrome网络选项卡中。但服务器端脚本使用我没想到的字符集解码该参数。
我编写了以下脚本来重现这种情况。
use utf8; #script encoded in utf8 without bom
use Mojolicious::Lite;
use Data::HexDump;
{
require Mojolicious;
say "perl $^V, Mojolicious: v", Mojolicious->VERSION, ", ", `chcp` ;
}
post '/' => sub{
my $self = shift;
my $params = $self->req->params->to_hash;
app->log->debug("received data:\n", HexDump( $params->{a} ) );
use Devel::Peek;
Dump( $params->{a} );
$self->render( text => "ok for '$params->{a}'" );
};
if(my $pid = fork()){
use Mojo::UserAgent;
my $t = Mojo::UserAgent->new;
#simulate front-end query
my $tx = $t->post('http://127.0.0.1:3042/' =>
{ 'Content-Type' => 'application/x-www-form-urlencoded; charset=UTF-8' },
form => { a => 'été'}
);
my $res = $tx->res->body;
say "result:\n", HexDump($res);
use Devel::Peek;
Dump( $res );
kill 'SIGKILL', $pid;
exit(0);
}
app->start(qw(daemon --listen http://*:3042 ));
此脚本的输出是:
perl v5.20.1, Mojolicious: v6.05, Page de codes active : 850
[Tue May 26 12:31:15 2015] [info] Listening at "http://*:3042"
Server available at http://127.0.0.1:3042
[Tue May 26 12:31:16 2015] [debug] Your secret passphrase needs to be changed
[Tue May 26 12:31:16 2015] [debug] POST "/"
[Tue May 26 12:31:16 2015] [debug] Routing to a callback
[Tue May 26 12:31:16 2015] [debug] received data:
00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF
00000000 E9 74 E9 .t.
SV = PVMG(0x5a7a198) at 0x4dce730
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x5b62c48 "\303\251t\303\251"\0 [UTF8 "\x{e9}t\x{e9}"]
CUR = 5
LEN = 10
[Tue May 26 12:31:16 2015] [debug] 200 OK (0.005052s, 197.941/s)
result:
00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF
00000000 6F 6B 20 66 6F 72 20 27 - C3 A9 74 C3 A9 27 ok for '..t..'
SV = PV(0x41a73e8) at 0x4927070
REFCNT = 1
FLAGS = (PADMY,POK,IsCOW,pPOK)
PV = 0x5aa1328 "ok for '\303\251t\303\251'"\0
CUR = 14
LEN = 16
COW_REFCNT = 1
因此,我们可以看到服务器在标记为"a"
的字符串中收到utf8
参数,其中包含缓冲区"\x{e9}t\x{e9}"
。
我期待"été"
使用hexa "C3 A9 74 C3 A9"
。
有什么问题?
答案 0 :(得分:1)
"\xE9t\xE9"
,它们是相同的,perl unicode字符串不存储在内存中作为utf8,它们从utf解码为unicode代码点/序数,utf8只是一种编码/表示unicode代码点/序数的方法
é是序数233,请查看下面的维基百科链接(也是更新的程序)
嗯,été在utf8中仅为C3 A9 74 C3 A9
,数字/序数été为233 116 233
作为perl unicode字符串\xE9t\xE9
,数字233是十六进制的E9
更新:在我用编辑器创建utf8文件2之前,这里用perl创建。您可以看到它获得了您期望的正确字节数,并且当您将其视为utf或raw时,可以看出差异
$ perl -CS -e " print chr(233), chr(116), chr(233) " >2
$ od -tx1 2
0000000 c3 a9 74 c3 a9
0000005
$ type 2
été
$
$ perl -MData::Dump -MPath::Tiny -e " dd ( path(2)->slurp_raw ) "
"\xC3\xA9t\xC3\xA9"
$ perl -MData::Dump -MPath::Tiny -e " dd ( path(2)->slurp_utf8 ) "
"\xE9t\xE9"
$ perl -MData::Dump -MPath::Tiny -e " dd( map { [ $_, ord$_ ] } split //, path(2)->slurp_utf8 ) "
(["\xE9", 233], ["t", 116], ["\xE9", 233])
答案 1 :(得分:1)
U+00E9
是é的代码点。 c3 a9
是UTF-8编码。要查看'é'
的UTF-8编码形式,您需要对其进行UTF-8编码。例如:
#!/usr/bin/env perl -l
use utf8;
use strict;
use warnings;
use Unicode::UTF8 qw( encode_utf8 );
binmode STDOUT, ':encoding(UTF-8)';
my $é = "\x{e9}";
print $é;
printf "%v02x\n", encode_utf8($é);
输出:
$ ./u.pl é c3.a9