Perl XML :: Twig - 具有类似标记的Extract字段

时间:2015-01-27 12:47:50

标签: xml perl xml-twig

我试图解析一个带有类似标签的巨大xml文件。目前我只能解析第一个标签和first_child

以下是xml:

的示例
<?xml version="1.0" encoding="UTF-8"?>
<test version="1.0">
  <parameters/>
  <category name="z1" description="jobs currently running" count="30" timestamp="2010-01-16T14:24:31">
    <jobs name="ZEI018CL" owner="A" type="auto" activityLevel="147" threadId="202" pid="20521" vmName="Subs@xx.xxx.xx.xxx:6102:xxx" cpuUsage="0"/>
    <job name="ZUA002B" owner="A" type="auto" activityLevel="3375" threadId="194" pid="20521" vmName="Subs@xx.xxx.xx.xxx:6102:xxx" cpuUsage="0"/>
    <job name="ZZZ855" owner="A" type="auto" activityLevel="0" threadId="107" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZKA019CL" owner="A" type="auto" activityLevel="0" threadId="105" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZIN41B" owner="A" type="auto" activityLevel="3" threadId="104" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZIN198CL" owner="A" type="auto" activityLevel="0" threadId="103" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZHO060" owner="A" type="auto" activityLevel="61" threadId="102" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZEI019CL" owner="A" type="auto" activityLevel="0" threadId="101" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZEI013CL" owner="A" type="auto" activityLevel="0" threadId="99" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZEI011CL" owner="A" type="auto" activityLevel="0" threadId="98" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZEC007CL" owner="A" type="auto" activityLevel="0" threadId="97" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/>
    <job name="ZEC001B" owner="A" type="auto" activityLevel="2" threadId="96" pid="20457" vmName="Subs@xx.xxx.xx.xxx:6101:xxx" cpuUsage="0"/></category>
 <category name="z3" description="Batchjobs" count="0" timestamp="2015-01-16T14:24:31"/>
  <category name="z4" description="Interactivejobs jobs currently running in the system" count="498" timestamp="2015-01-16T14:24:31">
    <job name="CAS" owner="PA" type="interactive" activityLevel="0" threadId="14624" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/>
    <job name="CR" owner="K" type="interactive" activityLevel="0" threadId="14586" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/>
    <job name="MM" owner="DU" type="interactive" activityLevel="0" threadId="14570" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/>
    <job name="ZZ" owner="D" type="interactive" activityLevel="0" threadId="14568" pid="23771" vmName="Subs@xx.xxx.xx.xxx:6104:xxx" cpuUsage="0"/></category>
 <category name="services" description="The status" timestamp="2015-01-16T14:24:31">
    <service name="1" description="test1" port-status="up" thread-status="up"/>
    <service name="2" description="test2" port-status="up" thread-status="up"/>
    <service name="3" description="test3" port-status="N/A" thread-status="up"/>
    <service name="4" description="test4" port-status="up" thread-status="up"/></category></test>

对于第一行我

my $parser = XML::Twig->new();
$parser->parsefile($xml);

对于我使用的第一行

my $count = $parser->root->first_child('category')->att('count');
print $count;

下一行这个

my $service = $parser->root->first_child('category')->first_child('job')->att('name');
print $service;

但我无法弄清楚如何获取特定名称的端口状态,如:

或者对于特定的作业名称,请输入第二个标记中的类型。

你能帮助我吗?

2 个答案:

答案 0 :(得分:1)

在你的情况下,最简单的可能是使用XPath来获得你想要的东西:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig::XPath;

my( $service, $infile)= @ARGV;

my $t= XML::Twig->new()
                ->parsefile( $infile);

# get the service first, then the attribute
# note the \@'s, where Perl and XPath syntaxes collide
my @services= $t->findnodes( qq{//service[\@name="$service"]});
my $status= $services[0]->att( 'port-status');
print "status: $status\n";

# get it in one swell XPath query
my $status2= $t->findvalue( qq{//service[\@name="$service"]/\@port-status});
print "status: $status2\n";

如果您的XML文件非常庞大,并且取决于您需要做什么,那么使用处理程序可能会有更好的选择。你的例子很难说清楚。

答案 1 :(得分:0)

我的猜测是你想要这样的东西:

foreach ($parser->root->children('section[@name="1"]')){
  print join ", ", @{$_->atts}{'port-status', 'thread-status'}
}

使用children('section[@name="1"]'),您将获得section属性为name的所有1个元素。

然后您使用atts方法询问该元素的哈希引用并提取port-statusthread-status

修改:抱歉修复了,忘了你带孩子不止一个。