在perl中是否有办法确定字符串编码的utf-8
或cp1252
中的哪一个?
答案 0 :(得分:1)
核心Encode::Guess应该可以胜任†
use Encode::Guess;
my $enc = guess_encoding($data, qw(cp1252)); # utf8 among defaults
然后
ref($enc) or die "Can't guess: $enc"; # trap error this way $utf8 = $enc->decode($data);
(来自docs)。
为了不也使用默认的“ ascii,utf8和带BOM的UTF-16/32 ”更改第一个
Encode::Guess->set_suspects(qw(utf8 cp1252));
然后获取编码
my $enc = guess_encoding($data);
或者,从文档中复制
my $decoder = Encode::Guess->guess($data); die $decoder unless ref($decoder); my $utf8 = $decoder->decode($data);
有关详细信息,请参阅文档。
答案 1 :(得分:1)
my $could_be_utf8 = utf8::decode( my $tmp = $string );
my $could_be_cp1252 = $string !~ /[\x81\x8D\x8F\x90\x9D]/;
如果您需要处理包含两者混合的字符串,请参阅Fixing a file consisting of both UTF-8 and Windows-1252。