XML在Perl中使用嵌套标记进行解析

时间:2012-04-24 09:48:48

标签: perl xml-parsing

我正在尝试解析具有嵌套标签集合的xml文件。我正在尝试使用perl XML :: Simple API进行解析,并且单个标记值被精确解析但无法解析嵌套标记值。

<archetype>
    <original_language></original_language>
    <description></description>
    <archetype_id>
    <definition></definition>
    <ontology></ontology>
</archetype>
定义部分中的

包含项目详细信息

例如

<definition>
.
.
<node_id>at0004</node_id>
<attributes xsi:type="C_SINGLE_ATTRIBUTE">
<rm_attribute_name>value</rm_attribute_name>
+<existence> </existence>
<children xsi:type="C_DV_QUANTITY">
    <rm_type_name>DV_QUANTITY</rm_type_name>
    +<occurrences></occurrences>
    <node_id/>
    +<property></property>
    <list>
    <magnitude>
        <lower_included>true</lower_included>
        <upper_included>false</upper_included>
        <lower_unbounded>false</lower_unbounded>
        <upper_unbounded>false</upper_unbounded>
        <lower>0.0</lower>
        <upper>1000.0</upper>
</magnitude>
<units>mm[Hg]</units>
</list>
</children>
</attributes>
.
.
</definition>

从上面的示例文件格式我想过滤像

这样的内容
node_id - > at0004
    magnitude -> lower -> 0.0
    magnitude -> higher -> 1000.0

请指导我过滤内容。

2 个答案:

答案 0 :(得分:2)

您需要了解参考资料:perlreftutperlrefperldsc

use strictures;
use XML::Simple qw(:strict);

my $root = XMLin(<<'XML', ForceArray => 0, KeyAttr => undef);
<definition>
.
.
<node_id>at0004</node_id>
<attributes xsi:type="C_SINGLE_ATTRIBUTE">
<rm_attribute_name>value</rm_attribute_name>
+<existence> </existence>
<children xsi:type="C_DV_QUANTITY">
    <rm_type_name>DV_QUANTITY</rm_type_name>
    +<occurrences></occurrences>
    <node_id/>
    +<property></property>
    <list>
    <magnitude>
        <lower_included>true</lower_included>
        <upper_included>false</upper_included>
        <lower_unbounded>false</lower_unbounded>
        <upper_unbounded>false</upper_unbounded>
        <lower>0.0</lower>
        <upper>1000.0</upper>
</magnitude>
<units>mm[Hg]</units>
</list>
</children>
</attributes>
.
.
</definition>
XML

my $m = $root->{attributes}{children}{list}{magnitude};
printf <<'TEMPLATE', $root->{node_id}, $m->{lower}, $m->{upper};
node_id -> %s
    magnitude -> lower -> %.1f
    magnitude -> higher -> %.1f
TEMPLATE

use Data::Dump::Streamer qw(Dump); Dump $root;

输出:

node_id -> at0004
    magnitude -> lower -> 0.0
    magnitude -> higher -> 1000.0

$HASH1 = {
    attributes => {
        children => {
            content => [("\n    +") x 2],
            list    => {
                magnitude => {
                    lower           => '0.0',
                    lower_included  => 'true',
                    lower_unbounded => 'false',
                    upper           => '1000.0',
                    upper_included  => 'false',
                    upper_unbounded => 'false'
                },
                units => 'mm[Hg]'
            },
            node_id      => {},
            occurrences  => {},
            property     => {},
            rm_type_name => 'DV_QUANTITY',
            "xsi:type"   => 'C_DV_QUANTITY'
        },
        content           => "\n+",
        existence         => {},
        rm_attribute_name => 'value',
        "xsi:type"        => 'C_SINGLE_ATTRIBUTE'
    },
    content => [("\n.\n.\n") x 2],
    node_id => 'at0004'
};

答案 1 :(得分:1)

这是一个可以做到的XML::Twig程序,虽然我做了一些你可能需要调整的假设。我不知道<defintions>是否可以有多个节点属性对,所以我写这个来处理多对:

#!/Users/brian/bin/perls/perl5.14.2

use XML::Twig;
use Data::Dumper;

my $twig = XML::Twig->new(
    twig_handlers => {
        magnitude => sub {
            my $m = $_;
            my $hash = $m->simplify;
            my $node_id = $m->parent( 'attributes' )->prev_sibling( 'node_id' )->text;
            print "node -> $node_id\n",
                "\tmagnitude -> lower -> $hash->{lower} $units\n",
                "\tmagnitude -> higher -> $hash->{upper} $units\n";
            },
        },
    );

$twig->parse(*DATA);


__END__
<definition>

<node_id>at0004</node_id>
<attributes xsi:type="C_SINGLE_ATTRIBUTE">
    <rm_attribute_name>value</rm_attribute_name>
    <existence> </existence>
    <children xsi:type="C_DV_QUANTITY">
        <rm_type_name>DV_QUANTITY</rm_type_name>
        <occurrences></occurrences>
        <node_id/>
        <property></property>
        <list>
            <magnitude>
                <lower_included>true</lower_included>
                <upper_included>false</upper_included>
                <lower_unbounded>false</lower_unbounded>
                <upper_unbounded>false</upper_unbounded>
                <lower>0.0</lower>
                <upper>1000.0</upper>
            </magnitude>
            <units>mm[Hg]</units>
        </list>
    </children>
</attributes>

<node_id>at0005</node_id>
<attributes xsi:type="C_SINGLE_ATTRIBUTE">
    <rm_attribute_name>value</rm_attribute_name>
    <existence> </existence>
    <children xsi:type="C_DV_QUANTITY">
        <rm_type_name>DV_QUANTITY</rm_type_name>
        <occurrences></occurrences>
        <node_id/>
        <property></property>
        <list>
            <magnitude>
                <lower_included>true</lower_included>
                <upper_included>false</upper_included>
                <lower_unbounded>false</lower_unbounded>
                <upper_unbounded>false</upper_unbounded>
                <lower>100.9</lower>
                <upper>998.7</upper>
            </magnitude>
            <units>mm[Hg]</units>
        </list>
    </children>
</attributes>

</definition>