UTF-8编码的JSON文件,尝试使用JSON模块进行解析-宽字符

时间:2019-02-05 13:41:32

标签: perl utf-8

我在Perl中有一个非常简单的脚本:

use JSON;

use open qw/ :std :encoding(utf8) /;

#my $ref = JSON::decode_json($json_contents);

my $path = "/home/chambres/web/x.org/public_html/cgi-bin/links/admin/booking_import/import/file.json";

my $json_contents = slurp_utf8_file($path);

my $ref =  JSON->new->utf8->decode($json_contents);

sub slurp_utf8_file {

  my @back;
  #open my $in,  '<:encoding(UTF-8)',  $_[0]  or die $!;
  open my $in,  "<$_[0]" or die $!;
    while (<$in>) {
      push @back, $_
    }
  close ($in);

  return join("", @back);
}

该文件在Notepad ++中以UTF-8编码:

enter image description here

...但是,当我运行脚本时,我得到了:

perl test.cgi
Wide character in subroutine entry at test.cgi line 11.

第11行是:

my $ref =  JSON->new->utf8->decode($json_contents);

我对自己做错了事感到困惑。也许我只需要休息一下!任何建议将不胜感激!

1 个答案:

答案 0 :(得分:3)

您正在尝试对UTF-8进行双重解码:

#!/usr/bin/perl
use strict;
use warnings;

use JSON;
use Data::Dumper;

open(my $fh,  '<:encoding(UTF-8)', $ARGV[0]) or die $!;
my @lines = <$fh>;
close($fh) or die $!;

# Wide character in subroutine entry at dummy.pl line 14.
my $ref = JSON->new->utf8->decode(join('', @lines));

# OK, no warning.
my $ref = JSON->new->decode(join('', @lines));

print Dumper($ref);

exit 0;

试运行

$ cat dummy.json
{
   "path": "ä⁈"
}

# with ->utf8
$ perl dummy.pl dummy.json
Wide character in subroutine entry at dummy.pl line 14.

# without ->utf8
$ perl dummy.pl dummy.json
$VAR1 = {
          'path' => "\x{e4}\x{2048}"
        };