我在解析XML时遇到了“不一致”(对我来说):
use 5.14.2;
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml;
{local $/;
$xml = <DATA>;}
my $xmlParsed = XMLin($xml,
KeyAttr => {phone => 'type', tankstelle => 'id'},
ForceArray => [ 'phone' ],
ContentKey => '-content',
);
say Dumper($$xmlParsed{'tankstelle'});
__DATA__
<?xml version="1.0"?>
<tankstellen>
<tankstelle>
<id>63</id>
<phone type="main">0911 731586</phone>
<phone type="fax">0911 7592228</phone>
<number/>
</tankstelle>
<tankstelle>
<id>64</id>
<phone type="main">0911 732011</phone>
<phone type="fax"></phone>
<number>64</number>
</tankstelle>
<tankstelle>
<id>91</id>
<phone type="main">0911 732926</phone>
<phone type="fax">0911 732917</phone>
<number/>
</tankstelle>
<tankstelle>
<id>92</id>
<phone type="main">0911 737577</phone>
<phone type="fax"></phone>
<number/>
</tankstelle>
</tankstellen>
有时number是哈希值,有时是字符串。如果type =“fax”为空,则main包含内容。
我为解析器尝试了不同的选项,以便在没有运气的情况下摆脱main和number中的哈希值。
'64' => {
'number' => '64',
'phone' => {
'main' => {
'content' => '0911 732011'
},
'fax' => {}
}
},
'91' => {
'phone' => {
'fax' => '0911 732917',
'main' => '0911 732926'
},
'number' => {}
}
答案 0 :(得分:2)
令人遗憾的是XML::Simple
可能是CPAN上最复杂的XML模块,但初学者选择它希望轻松骑行。它自己的文档现在说这个
不鼓励在新代码中使用此模块。其他模块可用,提供更直接和一致的接口。特别强烈建议使用XML :: LibXML。
你已经亲眼目睹了使用除最简单的XML以外的任何东西使其正常运行是多么困难,并且它有一个巨大的缺点,因为它以与元素相同的方式处理属性。
根据作者的建议,这个简短的程序会产生类似我想要的数据结构,其优点是你可以修改它以从XML创建你喜欢的任何结构。
use strict;
use warnings;
use XML::LibXML;
use Data::Dump;
my $xml = XML::LibXML->load_xml(IO => \*DATA);
my %data;
for my $ts ($xml->findnodes('/tankstellen/tankstelle')) {
my $id = $ts->findvalue('id');
$data{$id}{number} = $ts->findvalue('number');
for my $phone ($ts->findnodes('phone')) {
my $type = $phone->findvalue('@type');
$data{$id}{phone}{$type} = $phone->findvalue('text()');
}
}
dd \%data;
__DATA__
<?xml version="1.0"?>
<tankstellen>
<tankstelle>
<id>63</id>
<phone type="main">0911 731586</phone>
<phone type="fax">0911 7592228</phone>
<number/>
</tankstelle>
<tankstelle>
<id>64</id>
<phone type="main">0911 732011</phone>
<phone type="fax"></phone>
<number>64</number>
</tankstelle>
<tankstelle>
<id>91</id>
<phone type="main">0911 732926</phone>
<phone type="fax">0911 732917</phone>
<number/>
</tankstelle>
<tankstelle>
<id>92</id>
<phone type="main">0911 737577</phone>
<phone type="fax"></phone>
<number/>
</tankstelle>
</tankstellen>
<强>输出强>
{
63 => {
number => "",
phone => { fax => "0911 7592228", main => "0911 731586" },
},
64 => {
number => 64,
phone => { fax => "", main => "0911 732011" }
},
91 => {
number => "",
phone => { fax => "0911 732917", main => "0911 732926" },
},
92 => {
number => "",
phone => { fax => "", main => "0911 737577" }
},
}
工具已成功完成
答案 1 :(得分:1)
如前所述,强烈建议使用XML :: LibXML。
但是,如果(对于大型XML文档)内存效率比CPU速度更重要,可以考虑另一种选择:XML::Reader::PP
use strict;
use warnings;
use XML::Reader::PP;
use Data::Dump;
my $rdr = XML::Reader::PP->new(\*DATA, { mode => 'branches' },
{ root => '/tankstellen/tankstelle', branch => [
'id',
'phone[@type="main"]',
'phone[@type="fax"]',
'number',
]});
my %data;
while ($rdr->iterate) {
my ($id, $ph_main, $ph_fax, $num) = $rdr->value;
$_ //= '' for ($id, $ph_main, $ph_fax, $num);
$data{$id}{'number'} = $num;
$data{$id}{'phone'}{'main'} = $ph_main;
$data{$id}{'phone'}{'fax'} = $ph_fax;
}
dd \%data;
__DATA__
<?xml version="1.0"?>
<tankstellen>
<tankstelle>
<id>63</id>
<phone type="main">0911 731586</phone>
<phone type="fax">0911 7592228</phone>
<number/>
</tankstelle>
<tankstelle>
<id>64</id>
<phone type="main">0911 732011</phone>
<phone type="fax"></phone>
<number>64</number>
</tankstelle>
<tankstelle>
<id>91</id>
<phone type="main">0911 732926</phone>
<phone type="fax">0911 732917</phone>
<number/>
</tankstelle>
<tankstelle>
<id>92</id>
<phone type="main">0911 737577</phone>
<phone type="fax"></phone>
<number/>
</tankstelle>
</tankstellen>
输出:
{
63 => {
number => "",
phone => { fax => "0911 7592228", main => "0911 731586" },
},
64 => {
number => 64,
phone => { fax => "", main => "0911 732011" }
},
91 => {
number => "",
phone => { fax => "0911 732917", main => "0911 732926" },
},
92 => {
number => "",
phone => { fax => "", main => "0911 737577" }
},
}