如何将XML提取并转换为Perl数据结构

时间:2011-12-13 17:14:59

标签: xml perl

我在一个文件夹中有xml文件,我需要从xml文件中提取一些信息并存储在一个hash.My xml文件看起来像这样

<?xml version="1.0" encoding="UTF-8"?>
<Servicemodule xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<Service Id="478" Name="Pump Motor">
<Description>It delivers actual pump speed</Description>
<ServiceCustomers>
   <SW Service="SKRM" Path="/work/hr_service.xml"/>
</ServiceCustomers>
<ServiceSuppliers>
   <HW Type="s" Nr="12" Service="1" Path="/work/hardware.xml"/>
   <HW Type="v" Nr="2" Service="1" Path="/work/hardware.xml"/> 
   <HW Type="mt" Nr="1" Service="1" Path="/work/hardware.xml"/>
 </ServiceSuppliers>
 </Service>
 </Servicemodule>

我想将此信息保存在像服务ID这样的散列中作为密钥和扩孔信息作为该密钥的散列值数组。服务客户和供应商元素中的SW和HW属性是散列键(服务ID)的值数组。它对于专家来说很容易,但我是新学员,所以这个问题让我很痛苦。 我试过这个

use strict;
use warnings;
use feature ':5.10';
use XML::Twig;
use File::Find;

my $num=0;
my %combeh;
my $dir="V:/Main/work";
find(\&wanted, $dir);
 sub wanted() {
    if ( -f and /(_service\.xml)$/) {# find all the files with a suffix of .xml                                          
 my $tweak_server =sub{
                my @bhi;                                                                                       
    my ($twig, $root) =@_;                                                                                     
    my $code=$root->first_child_text('Service Id');                                                                                 
    my $ser=$root->first_child('ServiceCustomers');                                                                                      
    my $ser_cnt=$root->first_child_text('SW');
     my $ser1=$root->first_child('ServiceSuppliers');                                                                                      
    my $ser1_cnt=$root->first_child_text('HW');                                                                                      
    if ($ser){                                                                                                                      
    push (@bhi, $ser->toString,$File::Find::name);                                                                                                                     
       $combeh{$code}=[@bhi];
         }
       if ($ser1){                                                                                                                     
    push (@bhi, $ser1->toString,$File::Find::name);                                                                                                                    
       $combeh{$code}=[@bhi];           
            };
        my $roots = { Service => 1 };
  my $handlers = { 'Servicemodule/Service' => $tweak_server,                                                                                                                                                                             
                       };
       my $twig = new XML::Twig(TwigRoots => $roots,                                                                                                                                                                                                           
                             TwigHandlers => $handlers,                                                                                                                                                                                                          
                               pretty_print  => 'indented'                                                                                                                                                                            
                               );
               $twig->parsefile($_);                                                                                      
                             }                       
                       }
               return (%combeh) ;
                }

我无法使用上面的脚本创建哈希。请帮助我使用脚本如何获取属性值并存储在这样的哈希中 出来就像这样需要

 '478'=>[
          {
          Description='It delivers actual pump speed'
          }
         {
           Service='SKRM',
           Path='/work/hr_service.xml'
          }
          {
             Type='s'.
             Nr='12',
             Service='s',
             path='/work/hardware.xml'
           }

          {
             Type='v'.
             Nr='2',
             Service='s',
             path='/work/hardware.xml'
           }
          {
             Type='mt'.
             Nr='1',
             Service='1',
             path='/work/hardware.xml'
           }
         ...
          ...
          ....

请帮我解决这个问题。

提前致谢。

我在你的建议之后尝试了这个

#!/usr/bin/perl
 use warnings;
 use strict;
 use XML::Simple;
 use Carp;
 use File::Find;
 use File::Spec::Functions qw( canonpath );     
 use Data::Dumper;

 my @ARGV ="C:/Main/work";die "Need directories\n" unless @ARGV;
 find(
  sub {
     return unless ( /(_service\.xml)$/ and -f );
     extract_information();
    return;
  },
 @ARGV
  );

sub extract_information {
         my $path= $_;

my $xml=XMLin($path);
   my $xml_services = $xml->{Service};  
   my %services;
   for my $xml_service (@$xml_services) {

    my %service = (
        description     => $xml_service->{Description},
        name            => $xml_service->{Name},
        id              => $xml_service->{Id},
    );

     $service{sw} = _maybe_list( $xml_service->{ServiceCustomers}{SW} );
    $service{hw} = _maybe_list( $xml_service->{ServiceSuppliers}{HW} );
    $service{sw} = _maybe_list( $xml_service->{ServiceSuppliers}{SW} );
     $services{ $service{id} } = \%service;
 }

 print Dumper \%services;

  }
 sub _maybe_list {
 my $maybe = shift;
 return ref $maybe eq 'ARRAY' ? $maybe : [$maybe];
 }

感谢您的回复,我是XML :: Simple的新手,我研究了这个模块并理解了您的脚本。但是当我运行你的代码时,我得到的错误就像for循环行中的“Not a array Reference”。我尝试用不同的方法来克服这个但仍然是同样的错误。有时我在ServiceSuppliers中有SW和HW属性。所以我添加了一行与你的格式相同的行。我有一个问题你说“如果XML中有一个单独的元素就不会被包装”但有时候在ServiceCustomers中我只有一个带有一些属性的元素,比如我在xml文件中显示的那样。是吗?或者我该怎么办?你能帮我解决这些问题。

请任何人帮我解决此错误。

1 个答案:

答案 0 :(得分:4)

如果XML文件不是太大,您可以使用XML::Simple更轻松地转换它。

XML :: Simple的优点是操作Perl数据结构比XML更方便。

缺点是它会占用更多内存,因为它必须将整个XML文件加载到内存中。它对XML中的外壳也很敏感。

use strict;
use warnings;

use XML::Simple;
use Data::Dumper;

process_service_xml(shift);

sub process_service_xml {
    my $xml = XMLin(shift);

    # Illustrating what you've got after XML::Simple processes it.
    print "******* XML::Simple input ********\n";
    print Dumper $xml;
    print "**********************************\n";

    # Pull out the Services
    my $xml_services = $xml->{Service};

    # Iterate through each Service to transform them
    my %services;
    for my $xml_service (@$xml_services) {
        # Pull out the basic information
        my %service = (
            description     => $xml_service->{Description},
            name            => $xml_service->{Name},

            # Redundant with the key, but useful to keep all the data about the
            # service in one place.
            id              => $xml_service->{Id},
        );

        # Get SW and HW as their own attributes.
        # If there's a single element in the XML it won't be wrapped in
        # an array, so make sure each are a list.
        $service{sw} = _maybe_list( $xml_service->{ServiceCustomers}{SW} );
        $service{hw} = _maybe_list( $xml_service->{ServiceSuppliers}{HW} );

        # Store the service in the larger hash, keyed by the ID.
        $services{ $service{id} } = \%service;
    }

    # And here's what the information has been transformed into.
    print "******* Services ********\n";
    print Dumper \%services;
    print "*************************\n";    
}

sub _maybe_list {
    my $maybe = shift;
    return ref $maybe eq 'ARRAY' ? $maybe : [$maybe];
}