我在使用Perl 5.22和Postgresql(9.4)的Mac(10.11.2)上使用带有UTF8字符的目录名时遇到问题。 Postgresql中的文本编码设置为UTF8。
如果我的目录名称中包含非ascii UTF8字符,那么如果目录名由Perl脚本读入或插入Perl脚本中的字符串,我可以chdir()到该目录。如果我将此名称插入到PG表中并将其读回(SELECT dirname FROM utfdirs),我无法将该目录转换为该目录。但是,屏幕上打印的字符串是相同的,两个字符串上的Perl cmp测试报告它们是相同的,而guess_encoding()报告都是UTF8。
#!/opt/local/bin/perl5.22
use strict;
use Cwd;
use DBI;
use Encode;
use Encode qw/from_to/;
use Encode::Detect;
use Encode::Guess;
use Encode::UTF8Mac;
#
Encode::Guess->add_suspects(qw/utf-8-mac/);
#
my $dbname = 'test';
my $dbh = DBI->connect("dbi:Pg:dbname=$dbname;host=localhost");
$dbh->do("SET client_min_messages TO WARNING");
#
my $homeDir = '/Users/jldasch';
chdir($homeDir) or die "Cannot cd to [$homeDir]\n";
opendir(D,".");
my @tdlist = sort grep(/(Lambda?)|(Delta?)/,readdir(D));
closedir(D);
$dbh->do("DELETE FROM utfdirs");
my $ins = $dbh->prepare("INSERT INTO utfdirs (dirname) VALUES (?)");
foreach my $d (@tdlist) {
chdir($homeDir);
my $ok = chdir($d) ? 1 : 0;
my $fp = "${homeDir}/${d}";
printf("%2d %s\n",$ok,$fp);
$ins->execute($fp);
}
my $rset = $dbh->selectall_arrayref("SELECT dirname FROM utfdirs ORDER BY dirname");
my $i = 0;
foreach my $r (@$rset) {
my $dbdir = $r->[0];
my $pdir = ${homeDir} . '/' . $tdlist[$i++];
print "$r->[0] $pdir\n";
my $encPerl = guess_encoding($pdir);
my $encDb = guess_encoding($dbdir);
print "Perl Encoding [$encPerl->{Name}]\n";
print "Db Encoding [$encDb->{Name}]\n";
unless ( chdir($dbdir) ) {
print "Cannot CD to DbDir [$dbdir]\n";
print "DbDir and PerlDir Match\n" if ($dbdir eq $pdir)
}
exit;
输出:
bash-3.2$ ./utfstuff2.pl
1 /Users/jldasch/DeltaΔ
1 /Users/jldasch/Lambdaλ
/Users/jldasch/DeltaΔ /Users/jldasch/DeltaΔ
Perl Encoding [utf8]
Db Encoding [utf8]
Cannot CD to DbDir [/Users/jldasch/DeltaΔ]
DbDir and PerlDir Match
/Users/jldasch/Lambdaλ /Users/jldasch/Lambdaλ
Perl Encoding [utf8]
Db Encoding [utf8]
Cannot CD to DbDir [/Users/jldasch/Lambdaλ]
DbDir and PerlDir Match
所以在我到目前为止检查的级别Perl告诉我字符串是相同的(cmp和guess_encoding()),它们打印相同,但它们不一样。
如何将Postgresql返回的UTF8字符串转换为可接受的字符串(在Perl中)作为chdir()的有效目录名?
答案 0 :(得分:0)
有一个模块Encode :: UTF8Mac似乎可以解决这个问题。
my $macOkDir = Encode::decode('utf-8-mac',$dbDir)
- John Daschbach