Question

我需要确保我使用perl脚本创建的输出文件具有代码集cp1252而不是UTF-8，因为它将在UNIX SQLplus框架中使用，该框架处理德语＆＃34; umlauts＆＃34;将值插入数据库列时不正确（我在Windows 10中使用草莓perl v5.18，我无法在UNIX SQL环境中设置NLS_LANG或chcp）。

使用这个小测试脚本，我可以重现输出文件＆＃34; testfile1.txt＆＃34;总是在UTF-8中，但是＆＃34; testfile2.txt＆＃34;是预期的CP1252。我如何强制输出＆＃34; testfile1.txt＆＃34;即使没有＆＃34;特殊＆＃34;也是CP1252文字中的字符？

#!/usr/bin/env perl -w
use strict;
use Encode;

# the result file under Windows 10 will have UTF-8 codeset
open(OUT,'> testfile1.txt');    
binmode(OUT,"encoding(cp-1252)");
print OUT encode('cp-1252',"this is a test");
close(OUT);

# the result file under Windows 10 will have Windows-cp1252 codeset
open(OUT,'> testfile2.txt');    
binmode(OUT,"encoding(cp-1252)");
print OUT encode('cp-1252',"this is a test with german umlauts <ÄäÜüÖöß>");
close(OUT);

Answer 1

我认为你的问题是基于一种误解。 testfile1.txt包含文字this is a test。这些字符在ASCII，Latin-1，UTF-8和CP-1252中具有相同的编码。 testfile1.txt同时适用于所有这些编码。

在源代码中包含文字Unicode字符，如下所示：

print OUT encode('cp-1252',"this is a test with german umlauts <ÄäÜüÖöß>");

你需要

use utf8;

在顶部。

此外，不要将文件句柄上的编码图层与显式encode()调用相结合。设置编码层并向其打印Unicode文本，或使用binmode(OUT)并将原始字节（从encode()返回）打印到其中。

顺便说一下，你不应该再使用-w了。

取代了它

use warnings;

附注

同样，裸字文件句柄和双参数打开是5.6之前的样式代码，不应该在2000年之后编写的代码中使用。（perl 5.005及更早版本也不支持Unicode /编码。）< / p>

您的代码的固定版本如下所示：

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;

{
    open(my $out, '>:encoding(cp-1252)', 'testfile1.txt') or die "$0: testfile1.txt: $!\n";    
    print $out "this is a test\n";
    close($out);
}

{
    open(my $out, '>encoding(cp-1252)', 'testfile2.txt') or die "$0: testfile2.txt: $!\n";    
    print $out "this is a test with german umlauts <ÄäÜüÖöß>\n";
    close($out);
}

如何在Windows 10中的perl＆gt; = 5.18中强制输出文件的代码集cp1252？

1 个答案: