使用Perl在单个文件中处理多个XML“文档”

时间:2011-10-06 15:31:00

标签: xml perl

编辑:对不起,当我的意思是'参考'并且我已经包含完整的属性时,我错误地输入'name'

我有一些xml文件,它们在一行中包含一个完整的xml文档。一个例子是:

<Reqeusts>
    <WRRequest><Request domain="foo.com"><Rows><Row includeascolumn="n" interval="hour" ref="time" type="group"/><Row includeascolumn="n"  ref="domain_id" type="group"/><Row />...</Rows><Columns><Column ref="user_id"/><Column ref="country_id"/><Column ref="country_name"/>...</Columns></Request></WRRequest>
.
.
.
</Requests>

为了清晰起见,我还没有包含许多属性。

我正在使用XML :: Parser&amp; XML :: SimpleObject可以正常工作。例如,我只是打印出每个元素的属性,除非我尝试打印出列元素的'ref'属性。然后我得到一个“未初始化的变量”错误。代码是:

#!/usr/bin/perl
use warnings;
use diagnostics;
use XML::Parser;
use XML::SimpleObject;
use Cwd;


if ($ARGV[0] eq "") {
  die "usage: sumXML.pl <input file> \n";
}

my $fileName = $ARGV[0];

my $parser = new XML::Parser(Style => 'Tree');
my $xso = XML::SimpleObject->new( $parser->parsefile("$fileName") );


foreach my $wrRequest ($xso->child('WRRequests')->children('RWRequest')) {
  print "Client Name: " . $wrRequest->attribute('clientName') . "\n";
foreach my $xmlRequest ($wrRequest->child('REQUEST')) {
  print "Domain name: " . $xmlRequest->attribute('domain') . "\n";
  print "Service: " . $xmlRequest->attribute('service') . "\n";
  foreach my $xmlRow ($xmlRequest->child('ROWS')->children('ROW')) {
    print "Row Reference: " . $xmlRow->attribute('ref') . "\n";
  }
  foreach my $xmlColumn ($xmlRequest->child('COLUMNS')->children('COLUMN')) {
    print "Column Reference: " . $xmlColumn->attribute('ref') . "\n";
  }
 }
  print "\n";
}

2 个答案:

答案 0 :(得分:1)

您的示例数据不会解析(即使您删除了点),因此它不是有效的XML。我不确定您的实际数据是什么样的,但这对于找到问题非常重要。

我确定XML::ParserXML::SimpleObject没有任何问题。所以请检查以下内容:

  • 您是否正确拼写元素/属性(请记住XML 区分大小写
  • 元素/属性是否确实存在(例如:每个REQUEST - 元素是否都有service - 属性?每个ROW都有ref - 属性吗? )。如果它们不存在,您必须拒绝输入数据或处理您拥有的数据。这当然取决于您的要求。
  • 可选:针对DTDXSD验证XML文档树,以验证数据完整性。这就像第二点的高级版本。

我实际上已经花时间让它工作了(只需更改元素名称的大小写,并稍微修改“示例数据”):

use strict;
use warnings;
use XML::Parser;
use XML::SimpleObject;
use Cwd;


my $inXML = join "", <DATA>;
print $inXML;

my $parser = new XML::Parser(Style => 'Tree');
my $xso = XML::SimpleObject->new( $parser->parse($inXML) );


foreach my $wrRequest ($xso->child('Requests')->children('WRRequest')) {
    print "Client Name: " . $wrRequest->attribute('clientName') . "\n";
    foreach my $xmlRequest ($wrRequest->child('Request')) {
        print "Domain name: " . $xmlRequest->attribute('domain') . "\n";
        print "Service: " . $xmlRequest->attribute('service') . "\n";
        foreach my $xmlRow ($xmlRequest->child('Rows')->children('Row')) {
            print "Row Reference: " . $xmlRow->attribute('ref') . "\n";
        }
        foreach my $xmlColumn ($xmlRequest->child('Columns')->children('Column')) {
            print "Column Reference: " . $xmlColumn->attribute('ref') . "\n";
        }
    }
    print "\n";
}


__DATA__
<Requests>
  <WRRequest clientName="foo">
    <Request service="fooService" domain="foo.com">
      <Rows>
        <Row includeascolumn="n" interval="hour" ref="time" type="group"/>
        <Row includeascolumn="n"  ref="domain_id" type="group"/>
      </Rows>
      <Columns>
        <Column ref="user_id"/>
        <Column ref="country_id"/>
        <Column ref="country_name"/>
      </Columns>
    </Request>
  </WRRequest>
</Requests>

输出:

Client Name: foo
Domain name: foo.com
Service: fooService
Row Reference: time
Row Reference: domain_id
Column Reference: user_id
Column Reference: country_id
Column Reference: country_name

我已经使用多个WRRequest元素进行了测试 - 元素(复制和粘贴) - 像魅力一样工作。

答案 1 :(得分:1)

我无法确定数据应该如何真正理想地组织,但我发现XML::Rules在这些情况下很方便。如果您对完全不同的方式持开放态度,例如: (我假设'ref'是每一行的关键,列名应保持顺序,你关心的只是'ref'属性等):

use strict;
use warnings;

use Data::Dumper;
use XML::Rules;

my $xml = <<XML;
<Requests>
  <WRRequest>
    <Request domain="foo.com" service="SomeService">
      <Rows>
        <Row includeascolumn="n" interval="hour" ref="time" type="group"/>
        <Row includeascolumn="n"  ref="domain_id" type="group"/>
      </Rows>
      <Columns>
        <Column ref="user_id"/>
        <Column ref="country_id"/>
        <Column ref="country_name"/>
      </Columns>
    </Request>
  </WRRequest>
</Requests>
XML

my @rules = (
  Request => sub { delete $_[1]->{_content}; print Dumper $_[1]; return },
  Rows    => 'pass no content',
  Columns => 'pass no content',
  Row     => 'no content by ref',
  Column  => sub { '@'.$_[0] => $_[1]{ref} },
);

my $p = XML::Rules->new(
  rules => \@rules,
);
$p->parse($xml);

__END__
$VAR1 = {
          'Column' => [
                      'user_id',
                      'country_id',
                      'country_name'
                    ],
          'domain' => 'foo.com',
          'time' => {
                    'type' => 'group',
                    'includeascolumn' => 'n',
                    'interval' => 'hour'
                  },
          'domain_id' => {
                         'type' => 'group',
                         'includeascolumn' => 'n'
                       },
          'service' => 'SomeService'
        };