我正在编写一个创建xml文件“settings.xml”的perl脚本。 (使用XML :: Writer)。我希望文件以UCS-2大端编码,但我不确定如何。
我尝试过这样的事情:open(my $output, "> :encoding(UCS-2BE)", "settings.xml");
,但所有这一切都会使文件输出变得一团糟(例如http://i.imgur.com/p9cruCf.png或一系列中文字符),同时保持编码文件为ANSI。
知道如何修复此问题,或者如何将文件转换为UCS-2?
我是Perl的初学者,对不起,如果其中一些没有意义的话。
编辑:对于遇到此问题的其他人,请参阅下面的答案,它们提供了如何解决问题的详尽说明。
答案 0 :(得分:2)
XML :: Writer不支持除US-ASCII和UTF-8之外的任何内容(如其ENCODING
构造函数参数的文档中所述)。使用XML :: Writer创建UCS-2be XML文档很棘手,但并非不可能。
use XML::Writer qw( );
# XML::Writer doesn't encode for you, so we need to use :encoding.
# The :raw avoids a problem with CRLF conversion on Windows.
open(my $fh, '>:raw:encoding(UCS-2be)', $qfn)
or die("Can't create \"$qfn\": $!\n");
# This prints the BOM. It's optional, but it's useful when using an
# encoding that's not a superset of US-ASCII (such as UCS-2be).
print($fh "\x{FEFF}");
my $writer = XML::Writer->new(
OUTPUT => $fh,
ENCODING => 'US-ASCII', # Use entities for > U+007F
);
$writer->xmlDecl('UCS-2be');
$writer->startTag('root');
$writer->characters("\x{00041}");
$writer->characters("\x{000C9}");
$writer->characters("\x{10000}");
$writer->endTag();
$writer->end();
下行:U + 007F以上的所有字符都将作为XML实体出现。在上面的例子中,
A
”(00 41
)的形式出现。好。É
”(00 26 00 23 00 78 00 43 00 39 00 3B
)的形式出现。不是最理想的,但还可以。𐀀
”(00 26 00 23 00 78 00 31 00 30 00 30 00 30 00 30 00 3B
)出现。好的,XML实体需要用UCB-2e
存储U + 10000。当且仅当您能保证不会向作家提供U + FFFF以上的字符时,您可以避免上述缺点。
use XML::Writer qw( );
# XML::Writer doesn't encode for you, so we need to use :encoding.
# The :raw avoids a problem with CRLF conversion on Windows.
open(my $fh, '>:raw:encoding(UCS-2be)', $qfn)
or die("Can't create \"$qfn\": $!\n");
# This prints the BOM. It's optional, but it's useful when using an
# encoding that's not a superset of US-ASCII (such as UCS-2be).
print($fh "\x{FEFF}");
my $writer = XML::Writer->new(
OUTPUT => $fh,
ENCODING => 'UTF-8', # Don't use entities.
);
$writer->xmlDecl('UCS-2be');
$writer->startTag('root');
$writer->characters("\x{00041}");
$writer->characters("\x{000C9}");
#$writer->characters("\x{10000}"); # This causes a fatal error
$writer->endTag();
$writer->end();
A
”(00 41
)的形式出现。好。É
”(00 C9
)的形式出现。好。以下是如何做到这一点,没有任何缺点:
use Encode qw( decode encode );
use XML::Writer qw( );
my $xml;
{
# XML::Writer doesn't encode for you, so we need to use :encoding.
open(my $fh, '>:encoding(UTF-8)', \$xml);
# This prints the BOM. It's optional, but it's useful when using an
# encoding that's not a superset of US-ASCII (such as UCS-2be).
print($fh "\x{FEFF}");
my $writer = XML::Writer->new(
OUTPUT => $fh,
ENCODING => 'UTF-8', # Don't use entities.
);
$writer->xmlDecl('UCS-2be');
$writer->startTag('root');
$writer->characters("\x{00041}");
$writer->characters("\x{000C9}");
$writer->characters("\x{10000}");
$writer->endTag();
$writer->end();
close($fh);
}
# Fix encoding.
$xml = decode('UTF-8', $xml);
$xml =~ s/([^\x{0000}-\x{FFFF}])/ sprintf('&#x%X;', ord($1)) /eg;
$xml = encode('UCS-2be', $xml);
open(my $fh, '>:raw', $qfn)
or die("Can't create \"$qfn\": $!\n");
print($fh $xml);
A
”(00 41
)的形式出现。好。É
”(00 C9
)的形式出现。好。𐀀
”(00 26 00 23 00 78 00 31 00 30 00 30 00 30 00 30 00 3B
)出现。好的,XML实体需要用UCB-2e
存储U + 10000。答案 1 :(得分:1)
您没有描述出现了什么问题,但是您可能遇到了一些perl版本在Windows上出现的错误,编码和crlf层之间的交互不良。如果是这样,这应该有效:
open(my $output, "> :raw:perlio:encoding(UCS-2BE):crlf:utf8", "settings.xml");
(有关解释,请参阅http://www.perlmonks.org/?node_id=608532。)
如果没有,请提供更多信息,而不是“所有这些都会使文件输出大乱”。展示问题的简短脚本会有所帮助。