在Perl中,我如何检查字符串中指定的编码是否有效?

时间:2012-03-31 13:14:46

标签: perl file-io character-encoding

说,我有一个接收两个参数的子:编码规范和文件路径。然后sub使用该信息打开一个文件进行读取,如下所示,简要介绍其基本要点:

run({
    encoding => 'UTF-16---LE',
    input_filename => 'test_file.txt',
});

sub run {
    my $args = shift;
    my ($enc, $fn) = @{ $args }{qw(encoding input_filename)};

    my $is_ok = open my $in,
        sprintf('<:encoding(%s)', $args->{encoding}),
        $args->{input_filename}
    ;
}

现在,这个呱呱叫:

Cannot find encoding "UTF-16---LE" at E:\Home\...

在插入到$args->{encoding}的第二个参数之前,确保open拥有有效编码规范的正确方式是什么?

更新

以下信息的提供是希望它在某些时候对某人有用。我还要提交bug report

Encode::Alias的文档根本没有提到find_alias。随便看看我的Windows系统上的Encode/Alias.pm会显示:

# Public, encouraged API is exported by default

our @EXPORT =
  qw (
  define_alias
  find_alias
);

但是,请注意:

#!/usr/bin/env perl

use 5.014;
use Encode::Alias;
say find_alias('UTF-8')->name;

的产率:

Use of uninitialized value $find in exists at C:/opt/Perl/lib/Encode/Alias.pm line 25. Use of uninitialized value $find in hash element at C:/opt/Perl/lib/Encode/Alias.pm line 26. Use of uninitialized value $find in pattern match (m//) at C:/opt/Perl/lib/Encode/Alias.pm line 31. Use of uninitialized value $find in lc at C:/opt/Perl/lib/Encode/Alias.pm line 40. Use of uninitialized value $find in pattern match (m//) at C:/opt/Perl/lib/Encode/Alias.pm line 31. Use of uninitialized value $find in lc at C:/opt/Perl/lib/Encode/Alias.pm line 40.

懒惰,2)首先假设我做错了什么,我决定寻求别人的智慧。

在任何情况下,the bug都是由于find_alias被导出为函数而未在代码中检查:{/ p>

sub find_alias {
    require Encode;
    my $class = shift;
    my $find  = shift;
    unless ( exists $Alias{$find} ) {

如果未将find_alias作为方法调用,则参数现在位于$class$find未定义。

HTH。

2 个答案:

答案 0 :(得分:5)

Encode::Alias->find_alias($encoding_name)返回一个对象,其name属性在成功时是规范编码名称,在失败时返回false。

$ Encode::Alias->find_alias('UTF-16---LE')
$ Encode::Alias->find_alias('UTF-16 LE')
Encode::Unicode  {
    Parents       Encode::Encoding
    Linear @ISA   Encode::Unicode, Encode::Encoding
    public methods (6) : bootstrap, decode, decode_xs, encode, encode_xs, renew
    private methods (0)
    internals: {
        endian   "v",
        Name   "UTF-16LE",
        size   2,
        ucs2   ""
    }
}
$ Encode::Alias->find_alias('Latin9')
Encode::XS  {
    public methods (9) : cat_decode, decode, encode, mime_name, name, needs_lines, perlio_ok, renew, renewed
    private methods (0)
    internals: 140076283926592
}
$ Encode::Alias->find_alias('UTF-16 LE')->name
UTF-16LE
$ Encode::Alias->find_alias('Latin9')->name
iso-8859-15

答案 1 :(得分:4)

您可以使用Encode中的find_encoding功能。但是,如果您想将其用作:encoding图层,you should also check perlio_ok。编码可能存在(但很少见)但不支持使用:encoding

use Carp qw(croak);
use Encode qw(find_encoding);

sub run {
    my $args = shift;
    my $enc = find_encoding($args->{encoding}) 
      or croak "$args->{encoding} is not a valid encoding";
    $enc->perlio_ok or croak "$args->{encoding} does not support PerlIO";

    my $is_ok = open my $in,
        sprintf('<:encoding(%s)', $enc->name),
        $args->{input_filename}
    ;
}

注意:find_encoding 处理Encode :: Alias定义的别名。

如果您不关心区分不存在的编码和不支持:encoding的编码,则可以使用perlio_ok函数:

Encode::perlio_ok($args->{encoding}) or croak "$args->{encoding} not supported";