我必须使用Perl解析几个XML文件并在散列中存储变量。如果可能,我想过滤某些属性。稍后在我的代码中,我从哈希中提取数据并插入到数据库中。
我一直在使用XML::Parser
,但我更愿意解析哈希而不是处理遇到的每个标记。有什么建议吗?
我想跳过任何具有属性kind="dir"
的路径。我需要路径的作者,日期,消息和文件类型(文件扩展名)。 <path>
代码可以包含任意数字,可以是kind
&#34;文件&#34;或者&#34; dir&#34;。还可以有多个<logentry>
代码。
<?xml version="1.0" encoding="UTF-8"?>
<log>
<logentry revision="3989">
<author>cergyl</author>
<date>2013-07-19T05:31:01.212620Z</date>
<paths>
<path action="M" kind="dir">/team.admin/trunk/auth.conf</path>
</paths>
<path action="M" kind="file">/team.admin/trunk/file.cpp</path>
<msg>Whitespace change to verify repository synchronization</msg>
</logentry>
</log>
my $XML_Parser = XML::Parser->new(
Handlers => {
Start => \&hdl_xml_tag_start,
End => \&hdl_xml_tag_end,
Char => \&hdl_xml_nonmarkup_char,
Default => \&hdl_xml_default
}
);
# This event is generated when an XML start tag is recognized. Parser is an XML::Parser::Expat instance.
sub hdl_xml_tag_start
{
my ( $parser, $element, %attributes ) = @_;
$attributes{ '_str' } = "$element:";
$XML_Attributes_Hash_Ref = \%attributes;
return;
}
# This event is generated when an XML end tag is recognized. Note that an XML empty tag (<foo/>) generates both a start and an end event.
sub hdl_xml_tag_end
{
my ( $parser, $element ) = @_;
#format_message($XML_Attributes_Hash_Ref);
format_svn_history( $XML_Attributes_Hash_Ref );
return;
}
# This event is generated when non-markup is recognized. The non-markup sequence of characters is in String.
# A single non-markup sequence of characters may generate multiple calls to this handler.
sub hdl_xml_nonmarkup_char
{
my ( $parser, $string ) = @_;
$XML_Attributes_Hash_Ref->{ '_str' } .= $string;
return;
}
#This is called for any characters that don't have a registered handler.
sub hdl_xml_default { return; }
答案 0 :(得分:2)
由于您提供的信息有限,很难编写全面的解决方案,但这里有一些使用XML::Twig
处理您显示的XML数据并显示所有(一个)path
元素的内容没有kind
属性等于dir
。
XML::LibXML
也是基于C编码libxml2
use strict;
use warnings;
use XML::Twig;
my $parser = XML::Twig->new(
twig_handlers => {
path => \&path_handler,
}
);
$parser->parse(*DATA);
sub path_handler {
my ($twig, $path) = @_;
return if $path->att('kind') eq 'dir';
print $path->text, "\n";
}
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<log>
<logentry revision="3989">
<author>cergyl</author>
<date>2013-07-19T05:31:01.212620Z</date>
<paths>
<path action="M" kind="dir">/team.admin/trunk/auth.conf</path>
</paths>
<path action="M" kind="file">/team.admin/trunk/file.cpp</path>
<msg>Whitespace change to verify repository synchronization</msg>
</logentry>
</log>
<强>输出强>
/team.admin/trunk/file.cpp
答案 1 :(得分:0)
就个人而言,我喜欢来自XML::DOM的XML :: DOM :: Parser。但我使用XML :: Twig来打印它们。
my $xp = XML::DOM::Parser->new();
my $doc = $xp->parse("<xml></xml>");
$doc->dispose();
my $doc = $xp->parsefile("file.xml");
$doc->dispose();
// Pretty Print My poorly formatted xml doc
my $xpp = XML::Twig->new(pretty_print => 'indented');
$xpp->parse("<xml></xml>");
$xpp->print();