如何在perl中打印空XML元素

时间:2015-06-10 22:59:48

标签: xml perl xml-parsing

我偶尔会编写简单的perl脚本,将XML文件中的数据导出到CSV文件中,以便加载到数据库中。

我遇到了一个问题"打印一个没有价值的元素。它不打印任何东西,而是打印字符串" HASH(0x1ca05f8)" (或其兄弟姐妹)。

如何阻止它这样做?

以下是我正在使用的代码以及我正在使用的数据。谢谢, - sw

parse.pl:

#!/usr/bin/perl
#use module
use XML::Simple;
use Data::Dumper;

#create object
$xml = new XML::Simple;

#read XML file
$data = $xml->XMLin("$ARGV[0]", ForceArray=>1);

foreach $pr (@{$data->{product}})
{
  foreach $rv (@{$pr->{reviews}})
  {
    foreach $fr (@{$rv->{fullreview}})
    {
      print "$ARGV[1]", ",";
      print "$ARGV[2]", ",";
      print "$ARGV[3]", ",";
      print "$ARGV[4]", ",";
      print $pr->{"pageid"}->[0], ",";
      print $fr->{"status"}->[0], ",";
      print $fr->{"source"}->[0], ",";
      print $fr->{"createddate"}->[0], ",";
      print $fr->{"overallrating"}->[0], ",";
      print $fr->{"email_address_from_user"}->[0], ",";
      foreach $csg (@{$fr->{confirmstatusgroup}})
      {
        print join(";", @{$csg->{"confirmstatus"}});
      }


      print "\n";
    }
  }
}

data.xml中:

<?xml version="1.0" encoding="UTF-8"?>
<products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>bshnbat612</pageid>
<reviews>
<fullreview>
<status>Approved</status>
<createddate>2014-03-28</createddate>
<source>email</source>
<confirmstatusgroup>
<confirmstatus>Verified Purchaser</confirmstatus>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<overallrating>5</overallrating>
<email_address_from_user/>
</fullreview>
</reviews>
</product>
</products>

这会创建输出:

,,,,bshnbat612,Approved,email,2014-03-28,5,HASH(0xe9fee8),Verified Purchaser;Verified Reviewer

根据下面的建议,这里是Dumper输出:

$VAR1 = {
    'xmlns:xsi' => 'http://www.w3.org/2001/XMLSchema-instance',
    'product' => [
    {
        'xsi:type' => 'ProductWithReviews',
        'reviews' => [
        {
            'fullreview' => [
            {
                'source' => [
                    'email'
                ],
                'email_address_from_user' => [
                {}
                ],
                    'overallrating' => [
                        '5'
                    ],
                    'confirmstatusgroup' => [
                    {
                        'confirmstatus' => [
                            'Verified Purchaser',
                        'Verified Reviewer'
                        ]
                    }
                    ],
                        'status' => [
                            'Approved'
                        ],
                        'createddate' => [
                            '2014-03-28'
                        ]
            }
            ]
        }
        ],
            'pageid' => [
                'bshnbat612'
            ],
            'locale' => 'en_US'
    }
    ]
};

2 个答案:

答案 0 :(得分:2)

好的,XML::Simple文档中有一个很大的提示:

  

不鼓励在新代码中使用此模块。其他模块可用,提供更直接和一致的接口。特别强烈建议使用XML :: LibXML。

就个人而言,我喜欢XML::Twig

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

sub print_full_review {
    my ( $twig, $full_review ) = @_;
    my $pageid =
        $twig->root->get_xpath( '/products/product/pageid', 0 )->text;

    print join(
        ",",
        @ARGV[ 1 .. 4 ],
        $pageid,
        $full_review->first_child_text('status'),
        $full_review->first_child_text('source'),
        $full_review->first_child_text('createddate'),
        $full_review->first_child_text('overallrating'),
        $full_review->first_child_text('email_address_from_user'),
        join( ";",
            map { $_->text }
                $full_review->first_child('confirmstatusgroup')->children() )
        ),
        "\n";
}

my $twig = XML::Twig->new(
    'pretty_print'  => 'indented_a',
    'twig_handlers' => { 'fullreview' => \&print_full_review }
);
$twig->parsefile( $ARGV[0] );

每次解析器遇到fullreview元素时触发处理程序'print_full_review'(在树中的任何级别 - 如果这是一个问题,您可以通过将其设置为处理/product/products/reviews/fullreview来更具体)

此处理程序传递fullreview元素进行处理。

从中我们提取您寻求的价值观。

join( ";",
    map { $_->text }
        $full_review->first_child('confirmstatusgroup')->children() )

这是一种稍微复杂的做法:

my $confirmstatusgroup = $full_review -> first_child('confirmstatusgroup');
foreach my $confirmstatus ( $confirmstatusgroup -> children ) { 
    print $confirmstatus -> text,";";
}

但是上面的代码会产生你想要的输出,但不必像XML::Simple那样做任何'suppressempty'的捏造。

答案 1 :(得分:1)

查看可以传递给XML :: Simple的SuppressEmpty选项。没有它,XML :: Simple将为空元素提供空哈希。通过致电XMLin("$ARGV[0]", ForceArray=>1, SuppressEmpty=>1);,您的输出应为:,,,,bshnbat612,Approved,email,2014-03-28,5,,Verified Purchaser;Verified Reviewer