如何使用带有XML :: Feed的Perl从RSS提要中提取数据

时间:2014-02-09 01:27:28

标签: xml perl rss

我正在尝试从RSS源中获取有关MP3文件长度的信息。

这是我正在攻击的Perl脚本:

#!/usr/bin/perl
use XML::Feed;
use Data::Dumper;

my $rssurl      = "http://librivox.org/rss/4273";
my $feed = XML::Feed->parse(URI->new($rssurl))
    or die XML::Feed->errstr;
print $feed->title, "\n";
print $feed->description, "\n";
for my $entry ($feed->entries) {
#       print "entery is [$entry]\n";
#       print Dumper( $entry );
        print $entry->title, "\n";
        print $entry->{'http://www.itunes.com/dtds/podcast-1.0.dtd'}{'duration'} . "\n";
        print $entry->duration . "\n";
}

当我运行脚本时,我得到了这个输出:

Conquest Over Time by SHAARA, Michael
<p>Pat Travis, a spacer renowned for his luck, is suddenly quite out of it. His job is to beat his competitors to sign newly-Contacted human races to commercial contracts...

But what can he do when he finds he's on a planet that consults astrology for literally every major decision - and he has arrived on one of the worst-aspected days in history?

Michael Shaara, later to write the Pulitzer-winning novel "The Killer Angels", wrote this story for Fantastic Universe in 1956. (Summary by Mark F. Smith)</p>
1 - Section 1

Can't locate object method "duration" via package "XML::Feed::Entry::Format::RSS" at ./get_feed.pl line 15.

如果我添加print Dumper( $entry );进行调试,我可以看到这一点数据:

$VAR1 = bless({
  _version => "2.0",
  entry => {
    "enclosure" => {
      length => "9.6MB",
      type => "audio/mpeg",
      url => "http://www.archive.org/download/conquest_over_time_1005_librivox/conquestovertime_1_shaara_64kb.mp3",
    },
    "http://www.itunes.com/dtds/podcast-1.0.dtd" => { block => "No", duration => "00:20:00", explicit => "No" },
    "item" => ("\n    " x 12),
    "link" => "http://www.archive.org/download/conquest_over_time_1005_librivox/conquestovertime_1_shaara_64kb.mp3",
    "title" => "1 - Section 1",
  },
}, "XML::Feed::Entry::Format::RSS")

我想要的数据是持续时间00:20:00。我如何在我的脚本中获得它?

谢谢!

2 个答案:

答案 0 :(得分:1)

看起来有一个名为entry的主键需要使用:

$entry->{'entry'}{'http://www.itunes.com/dtds/podcast-1.0.dtd'}{'duration'}

答案 1 :(得分:1)

从像这样的对象的内部提取信息是不明智的。唯一有保证的功能是文档中描述的功能,作者可以随时更改实现,因为该接口未更改。

特别是这是XML 名称空间的一种不寻常的实现:您想要的元素在XML中标记为itunes:duration,名称空间为itunes。这是为了将其与可能出现在文档中的任何其他duration元素区分开来。您应该使用XPath提取所需的数据,如上一个问题中所述。这个简短的程序可以在不使用XML::Feed的情况下完成您的工作。

use strict;
use warnings;

use LWP::Simple 'get';
use XML::XPath;

my $rssurl = 'http://librivox.org/rss/4273';
my $xml    = get $rssurl;
my $xp     = XML::XPath->new(xml => $xml);

my ($channel) = $xp->findnodes('/rss/channel');
printf "Channel Title:       %s\n\n", $channel->find('title');
printf "Channel Description: %s\n\n", $channel->find('description');

print "ITEMS\n";
for my $item ($xp->findnodes('/rss/channel/item')) {
  printf "  Item Title:    %s\n", $item->find('title');
  printf "  Item Duration: %s\n", $item->find('itunes:duration');
  print "\n";
}

<强>输出

Channel Title:       Conquest Over Time by SHAARA, Michael

Channel Description: <p>Pat Travis, a spacer renowned for his luck, is suddenly quite out of it. His job is to beat his competitors to sign newly-Contacted human races to commercial contracts...

But what can he do when he finds he's on a planet that consults astrology for literally every major decision - and he has arrived on one of the worst-aspected days in history?

Michael Shaara, later to write the Pulitzer-winning novel "The Killer Angels", wrote this story for Fantastic Universe in 1956. (Summary by Mark F. Smith)</p>

ITEMS
  Item Title:    1 - Section 1
  Item Duration: 00:20:00

  Item Title:    2 - Section 2
  Item Duration: 00:18:35

  Item Title:    3 - Section 3
  Item Duration: 00:25:12

  Item Title:    4 - Section 4
  Item Duration: 00:16:38