使用pack创建unicode角色

时间:2015-06-15 17:50:11

标签: perl unicode

我试图理解Perl如何处理unicode。

use feature qw(say);
use strict;
use warnings;

use Encode qw(encode);

say unpack "H*", pack("U", 0xff);
say unpack "H*", encode( 'UTF-8', chr 0xff );

输出:

ff
c3bf

使用pack时,为什么我会获得ff而不是c3bf

2 个答案:

答案 0 :(得分:2)

  

为什么我在使用pack时会得到ff而不是c3bf?

这是因为pack创建了一个字符串,而不是字节串。

> perl -MDevel::Peek -e 'Dump(pack("U", 0xff));'
SV = PV(0x13a6d18) at 0x13d2ce8
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0xa6d298 "\303\277"\0 [UTF8 "\x{ff}"]
  CUR = 2
  LEN = 32

因此,unpack(“H *”)不会查看该字符串的字节值,而是查看它的(截断的)字符值。如果你这样做:

say unpack "H*", encode("UTF-8", pack("U", 0xff));

然后你会得到预期的结果。

另见this thread

答案 1 :(得分:2)

pack('U', 0xFF)

只是一种奇怪的做法

chr(0xFF)

所以

"\xFF"                             returns chars   FF
chr(0xFF)                          returns chars   FF
pack('U', 0xFF)                    returns chars   FF

"\xC3\xBF"                         returns chars   C3 BF
encode('UTF-8', chr(0xFF))         returns chars   C3 BF
encode('UTF-8', pack('U', 0xFF))   returns chars   C3 BF

所以

say unpack "H*", "\xFF";                             outputs   ff
say unpack "H*", chr(0xFF);                          outputs   ff
say unpack "H*", pack('U', 0xFF);                    outputs   ff

say unpack "H*", "\xC3\xBF";                         outputs   c3bf
say unpack "H*", encode('UTF-8', pack('U', 0xFF));   outputs   c3bf
say unpack "H*", encode('UTF-8', chr(0xFF));         outputs   c3bf