我有这个脚本。 Somefile.xsd是一个包含几个UTF-8字符的文件。然而,我发现我无法guess_encoding
报告与Encode :: Guess-> guess相同的编码。忽略它是一个XSD的事实,明显的事情(我确定它可能是显而易见的)我错过了我还没有完成?
use Encode;
use Encode::Guess;
open (FILE, "<", "somefile.xsd");
print ("Reading file...\n");
#$text = <FILE>;
while ($text = <FILE>) {
$encoding1 = Encode::Guess->guess($text);
if (ref($encoding1)) {
$name = $encoding1->name;
print "$name : $text" if ($name ne "ascii");
} else {
print ("Not found : $text");
}
$encoding2 = guess_encoding($text, qw/iso-8859-15 ascii iso-8859-1 utf8/);
if (ref($encoding2)) {
$name = $encoding2->name;
print "$name : $text" if ($name ne "ascii");
} else {
print ("Not found : $text");
}
}
close(FILE);
当我运行它时,它会给出以下结果:
H:\play>perl encoding.pl
Reading file...
utf8 : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
Not found : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
utf8 : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
Not found : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
utf8 : <xs:enumeration value="Volapük"/>
Not found : <xs:enumeration value="Volapük"/>
编辑澄清:我想使用guess_encoding
版本和第二个选项(即嫌疑人列表)。删除列表只会调用Encode::Guess->guess
。用例是我想检查一个文件是否与一组编码中的一个匹配,并且传递有效列表似乎比调用guess并在列表中查找名称更加优雅,特别是当我有{{ 1}}给我一个$encoding->name
的结果,这意味着我不能简单地检查列表是否相等。
答案 0 :(得分:0)
尝试删除qw:
$encoding2 = guess_encoding($text);
这应该给你正确的答案。
EDIT。
运行此代码:
use Encode;
use Encode::Guess;
open (FILE, "<", "somefile.xsd");
print ("Reading file...\n");
#$text = <FILE>;
while ($text = <FILE>) {
$encoding1 = Encode::Guess->guess($text);
if (ref($encoding1)) {
$name = $encoding1->name;
print "$name : $text" if ($name ne "ascii");
} else {
print ("Not found : $text");
}
$encoding2 = guess_encoding($text, qw/iso-8859-15 ascii iso-8859-1 utf8/);
if (ref($encoding2)) {
$name = $encoding2->name;
print "$name : $text" if ($name ne "ascii");
} else {
print ("Not found : $text");
}
$encoding3 = guess_encoding($text);
if (ref($encoding3)) {
$name = $encoding3->name;
print "$name : $text" if ($name ne "ascii");
} else {
print ("Not found : $text");
}
print "-"x40 ."\n";
}
close(FILE);
产生
Reading file...
utf8 : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
Not found : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
utf8 : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
----------------------------------------
utf8 : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
Not found : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
utf8 : <xs:enumeration value="Bokmål, Norwegian; Norwegian Bokmål"/>
----------------------------------------
utf8 : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
Not found : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
utf8 : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
----------------------------------------
utf8 : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
Not found : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
utf8 : <xs:enumeration value="Occitan (post 1500); Provenæ ¬"/>
----------------------------------------
utf8 : <xs:enumeration value="Volap├╝k"/>
Not found : <xs:enumeration value="Volap├╝k"/>
utf8 : <xs:enumeration value="Volap├╝k"/>
----------------------------------------
utf8 : <xs:enumeration value="Volap├╝k"/>
Not found : <xs:enumeration value="Volap├╝k"/>
utf8 : <xs:enumeration value="Volap├╝k"/>
----------------------------------------