我试图解析一个带有类似标签的巨大xml文件。目前我只能解析第一个标签和first_child
以下是xml:
的示例<?xml version="1.0" encoding="UTF-8"?>
<test version="1.0">
<parameters/>
<category name="z1" description="jobs currently running" count="30" timestamp="2010-01-16T14:24:31">
<jobs name="ZEI018CL" owner="A" type="auto" activityLevel="147" threadId="202" pid="20521" vmName="Subs@xx.xxx.xx.xxx:6102:xxx" cpuUsage="0"/>
<job name="ZUA002B" owner="A" type="auto" activityLevel="3375" threadId="194" pid="20521" vmName="Subs@xx.xxx.xx.xxx:6102:xxx" cpuUsage="0"/>
<job name="ZZZ855" owner="A" type="auto" activityLevel="0" threadId="107" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZKA019CL" owner="A" type="auto" activityLevel="0" threadId="105" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZIN41B" owner="A" type="auto" activityLevel="3" threadId="104" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZIN198CL" owner="A" type="auto" activityLevel="0" threadId="103" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZHO060" owner="A" type="auto" activityLevel="61" threadId="102" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZEI019CL" owner="A" type="auto" activityLevel="0" threadId="101" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZEI013CL" owner="A" type="auto" activityLevel="0" threadId="99" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZEI011CL" owner="A" type="auto" activityLevel="0" threadId="98" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZEC007CL" owner="A" type="auto" activityLevel="0" threadId="97" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
<job name="ZEC001B" owner="A" type="auto" activityLevel="2" threadId="96" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/></category>
<category name="z3" description="Batchjobs" count="0" timestamp="2015-01-16T14:24:31"/>
<category name="z4" description="Interactivejobs jobs currently running in the system" count="498" timestamp="2015-01-16T14:24:31">
<job name="CAS" owner="PA" type="interactive" activityLevel="0" threadId="14624" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/>
<job name="CR" owner="K" type="interactive" activityLevel="0" threadId="14586" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/>
<job name="MM" owner="DU" type="interactive" activityLevel="0" threadId="14570" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/>
<job name="ZZ" owner="D" type="interactive" activityLevel="0" threadId="14568" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/></category>
<category name="services" description="The status" timestamp="2015-01-16T14:24:31">
<service name="1" description="test1" port-status="up" thread-status="up"/>
<service name="2" description="test2" port-status="up" thread-status="up"/>
<service name="3" description="test3" port-status="N/A" thread-status="up"/>
<service name="4" description="test4" port-status="up" thread-status="up"/></category></test>
对于第一行我
my $parser = XML::Twig->new();
$parser->parsefile($xml);
对于我使用的第一行
my $count = $parser->root->first_child('category')->att('count');
print $count;
下一行这个
my $service = $parser->root->first_child('category')->first_child('job')->att('name');
print $service;
但我无法弄清楚如何获取特定名称的端口状态,如:
或者对于特定的作业名称,请输入第二个标记中的类型。
你能帮助我吗?
答案 0 :(得分:1)
在你的情况下,最简单的可能是使用XPath来获得你想要的东西:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig::XPath;
my( $service, $infile)= @ARGV;
my $t= XML::Twig->new()
->parsefile( $infile);
# get the service first, then the attribute
# note the \@'s, where Perl and XPath syntaxes collide
my @services= $t->findnodes( qq{//service[\@name="$service"]});
my $status= $services[0]->att( 'port-status');
print "status: $status\n";
# get it in one swell XPath query
my $status2= $t->findvalue( qq{//service[\@name="$service"]/\@port-status});
print "status: $status2\n";
如果您的XML文件非常庞大,并且取决于您需要做什么,那么使用处理程序可能会有更好的选择。你的例子很难说清楚。
答案 1 :(得分:0)
我的猜测是你想要这样的东西:
foreach ($parser->root->children('section[@name="1"]')){
print join ", ", @{$_->atts}{'port-status', 'thread-status'}
}
使用children('section[@name="1"]')
,您将获得section
属性为name
的所有1
个元素。
然后您使用atts
方法询问该元素的哈希引用并提取port-status
和thread-status
修改:抱歉修复了,忘了你带孩子不止一个。