如何使用Perl和XML :: Simple输出属性

时间:2012-06-05 23:47:56

标签: xml perl hash

sub parse_xml{
    my $xml_link = $_[0];
    my $xml_content = get($xml_link) or warn "Cant get XML page of " . $xml_link . "\n";
    if(!$xml_content){
        return;
    }
    my $xml =  XML::Simple->new(KeepRoot => 1);
    my $xml_data = $xml->XMLin($xml_content);
    my @items = $xml_data->{rss}{channel}->{item};
   # print Dumper($xml_data);
    foreach my $item (@items) {
        if($item){
             print Dumper($item);             //This is the dump output
             print $item->{author};
             #print $item . "\n";
        }
    }
}

当我尝试输出该项目的作者时,我只得到HASH(Memory Address)not a hash reference at ... line ...

我做错了吗?为什么会产生这个错误?

这是转储器输出。

$VAR1 = [
          {
            'link' => 'http://***.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67229',
            'author' => {},
            'title' => 'By: ',
            'pubDate' => 'Tue, 08 Jun 2010 12:47 EDT',
            'description' => 'Interesting. At least SHE remembered the product that propelled her to recent recognition. When many people I know have commented on how they loved that Betty White Super Bowl spot, they can't recall the product. Ah, advertising.'
          },
          {
            'link' => 'http://***.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67167',
            'author' => {},
            'title' => 'By: ',
            'pubDate' => 'Mon, 07 Jun 2010 13:26 EDT',
            'description' => 'Fun, fun, fun. A great attitude for all of us to take into our careers.'
          },
          {
            'link' => 'http://****.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67164',
            'author' => 'username',
            'title' => 'By: username',
            'pubDate' => 'Mon, 07 Jun 2010 12:23 EDT',
            'description' => 'Her appearance of the Comedy Central roast of William Shattner a couple of years ago was great... it seems like her willingness to be irreverent makes her more appealing to us all!  

www.adverspew.com'
          },
          {
            'link' => 'http://****.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67142',
            'author' => {},
            'title' => 'By: ',
            'pubDate' => 'Mon, 07 Jun 2010 09:50 EDT',
            'description' => 'Solid interview. I will definitely be tuning into "Hot in Cleveland" next week. We ought to enjoy Ms. White's talents for as long as we have her. She's great!'
          }
        ];

2 个答案:

答案 0 :(得分:1)

你正走在正确的轨道上。我已经在这个StackOverflow页面链接的新闻源上使用了你的代码,并对它进行了微调。

use LWP::Simple;
use XML::Simple;
use Data::Dumper;
sub parse_xml{
    my $xml_link = $_[0];
    my $xml_content = get($xml_link) or warn "Cant get XML page of " . $xml_link . "\n";
    if(!$xml_content){
        return;
    }
    my $xml =  XML::Simple->new(KeepRoot => 1);
    my $xml_data = $xml->XMLin($xml_content,ForceArray =>'entry');
    foreach my $item ($xml_data->{'feed'}[0]->{'entry'}) {
        foreach my $entry (@{$item}){
            if($entry){
                print $entry->{'author'}[0]->{'name'}[0]."\n";
                print $entry->{'author'}[0]->{'uri'}[0]."\n";
            }
        }

    }

}
parse_xml('http://stackoverflow.com/feeds/question/10906521');

在该示例中正常工作。我怀疑你可能正在尝试打印出一些不是普通值的东西 - 在stackoverflow页面的例子中,你可以看到'author'实际上包含一些子节点,所以如果你尝试打印$ item - 在foreach循环中的> {'author'},您将获得您描述的“HASH”结果。

看看你的转储和鲍罗丁的明智评论,这应该适合你:

   my $xml_data = $xml->XMLin($xml_content,ForceArray =>'entry');
    my $item = $xml_data->{'rss'}[0]->{'channel'}[0]->{'item'};
    foreach my $entry (@{$item}){
        if($entry){
            if(!ref $entry->{'author'}[0]){
                    print $entry->{'author'}[0]."\n";
            }
            if(!ref $entry->{'description'}[0]){
                    print $entry->{'description'}[0]."\n";
            }
            if(!ref $entry->{'pubDate'}[0]){
                    print $entry->{'pubDate'}[0]."\n";
            } # etc.
        }

答案 1 :(得分:1)

此RSS Feed可能包含或不包含每个项目的<author>信息。

如果没有作者,那么该元素仍会出现在XML中,但它没有内容。它显示为<author></author>

XML::Simple将此表示为空的匿名哈希。

因此,如果有项目的作者信息,$item->{author}将是一个简单的文本字符串。否则它将是一个哈希引用。

您可以通过编写

来编写代码
foreach my $item (@items) {
  my $author = $item->{author};
  $author = '' if ref $author;
  print "$item\n";
}