我想使用XML::DOM
模块解析一个简单的XML文档。
<?xml version ="1.0"?>
<Select>
<book>
<prop Name = "prop1" Title = "title1" />
<prop Name = "prop2" Title = "title2" />
</book>
<fruit>
<prop Name = "prop3" Title = "title3" />
<prop Name = "prop4" Title = "title4" />
</fruit>
</Select>
,预期输出为 -
$VAR1 = {
Select => {
book => {
prop => [
{ Name => "prop1", Title => "title1" },
{ Name => "prop2", Title => "title2" },
],
},
fruit => {
prop => [
{ Name => "prop3", Title => "title3" },
{ Name => "prop4", Title => "title4" },
],
},
},
}
我写的代码是:
use strict;
use XML::DOM;
use Data::Dumper;
my @stack;
my %hash;
push @stack,\%hash;
my $parser = new XML::DOM::Parser;
my $doc = $parser -> parsefile('demo.xml');
my $root = $doc->getDocumentElement();
my $rootnode = $root->getTagName;
################################################################
foreach my $node ($doc->getElementsByTagName($rootnode)){
push @stack,$stack[$#stack]->{$rootnode};
my @childnode = $node->getChildNodes();
foreach my $child(@childnode){
if($child->isElementNode){
my $childname = $child->getNodeName();
pop(@stack);
push @stack,$stack[$#stack]->{$rootnode} = {$childname,{}};
my @childnodes2 = $child->getChildNodes();
foreach my $subchild(@childnodes2){
if($subchild->isElementNode){
my $subchildname = $subchild->getNodeName();
my $name = $subchild->getAttributes->getNamedItem('Name')->getNodeValue;
my $title = $subchild->getAttributes->getNamedItem('Title')->getNodeValue;
pop(@stack);
push @stack,$stack[$#stack]->{$rootnode}->{$child->getNodeName()} = {$subchildname,{}}; #{} contains $name or $title
}
}
}
}
}
print Dumper(\%hash);
我想,我无法正确推送和弹出数组。另外,我不想使用XML::Simple
和递归。
我如何在Perl中执行此操作?
答案 0 :(得分:1)
这是一个可能的解决方案,假设整个文档遵循一个严格的模式,其中一个Select
作为根,任何不同名称的子节点(不会处理冲突),以及任何数量的{{1对于这些子节点,其中prop
和Name
字段是唯一有趣的。
这是序言,我还使用Title
来更好地处理错误。
Carp
这是主要代码。它启动一个解析器(假设该文档位于特殊的#!/usr/bin/perl
use strict; use warnings; use 5.012;
use XML::DOM;
use Data::Dumper;
use Carp;
文件句柄中),并将结果文档从DATA
子例程中传递出去。我经常考虑让脚本make_data_structure
尽早发现错误。
die
这是完成所有工作的子程序。它需要一个文档并返回一个符合您格式的hashref。
{
my $xml_parser = XML::DOM::Parser->new;
my $document_string = do{ local $/=undef; <DATA> };
my $document = $xml_parser->parse($document_string) or die;
my $data_structure = make_data_structure($document) or die;
print Dumper $data_structure;
}
以下是自定义错误处理子例程,以使上述代码更具表现力。
sub make_data_structure {
my ($document) = @_;
my $root = $document->getDocumentElement;
my $rootname = $root->getTagName // "undef";
didnt_expect_anything(but=> "Select", as=> "the root tag", got=> $rootname)
unless $rootname eq "Select";
my $dsc = +{ $rootname => +{} };
CHILD:
for my $child ($root->getChildNodes) {
next CHILD unless $child->isElementNode;
my $childname = $child->getTagName
// couldnt_get("the tag name", of=> "a $rootname child");
$dsc->{$rootname}{$childname} = undef; # unneccessary iff we have props
PROP:
for my $prop ($child->getChildNodes) {
next PROP unless $prop->isElementNode;
my $propname = $prop->getTagName // "undef";
die didnt_expect_anything(but=> "prop", got=> $propname)
unless $propname eq "prop";
my $attributes = $prop->getAttributes
// couldnt_get("the attributes", of=> "a prop node");
# for minimum code duplication, and maximum error handling,
# use dataflow programming, and `map`.
my ($Name, $Title) =
map { $_->getNodeValue // couldnt_get("the node value", of=>"the attribute") }
map { $attributes->getNamedItem($_) // couldnt_get("the named item $_", of=> "the prop attributes") }
qw/Name Title/;
my $propvalue = +{
Name => $Name,
Title => $Title,
};
push @{ $dsc->{$rootname}{$childname}{$propname} }, $propvalue;
}
}
return $dsc;
}
当然,产生了正确的输出,但这不正确的到达方式 - 使用了CPAN。
你的实施问题的一部分是(除了缺少错误处理),你用你的“堆叠”做一些复杂的体操。
在外循环的第一次迭代之前,sub didnt_expect_anything {
my %args = @_;
my $expected = $args{but} // croak qq(required named argument "but" missing);
my $role = $args{as} // "a tag name";
my $instead = $args{got} // croak qq(required named argument "got" missing);
croak qq(Didn't expect anything but "$expected" as $role here, got "$instead");
}
sub couldnt_get {
my ($what, %args) = @_;
my $of_what = $args{of} // croak qq(required named argument "of" missing);
croak qq(Couldn't get $what of $of_what);
}
为@stack
(对空哈希的引用)。
行+{}
访问堆栈的最后一个元素(更好地写为$stack[$#stack]->{$rootnode}
),将值视为散列引用,并查找名为$stack[-1]
的条目。评估结果为$rootnode
。然后将此值推入堆栈。随之而来的是混乱。