我正在尝试使用perl Web::Scraper
模块来抓取页面并处理各种元素。
我写了以下脚本:
use strict;
use warnings;
use Web::Scraper;
use Data::Dumper;
use URI;
my $purlToScrape='https://isohunt.to/torrents/?ihq=back+to+the+future&Torrent_sort=seeders.desc';
my $movcol = scraper {
process "td.title-row", "movcol[]" => scraper {
process "span", "title[]" => 'TEXT';
process "a", "url[]" => '@href';
};
};
my $details = $movcol->scrape(URI->new($purlToScrape));
print Dumper($details->{movcol});
输出:
$VAR1 = [
{
'url' => [
bless( do{\(my $o = 'https://isohunt.to/torrent_details/5538709/Back-to-the-Future-III-1990-720p-BrRip-x264-700MB-YIFY')}, 'URI::https' ),
bless( do{\(my $o = 'https://isohunt.to/torrents/?iht=5&age=0')}, 'URI::https' )
],
'title' => [
'Back to the Future III (1990) 720p BrRip x264 - 700MB - YIFY'
]
},
{
'url' => [
bless( do{\(my $o = 'https://isohunt.to/torrent_details/6395538/Back-to-the-Future-1985-1080p')}, 'URI::https' ),
bless( do{\(my $o = 'https://isohunt.to/torrents/?iht=5&age=0')}, 'URI::https' )
],
'title' => [
'Back to the Future (1985) [1080p]'
]
}
];
我尝试做的是处理每个标题元素。如何在代码中使用这些元素?
我尝试使用print Dumper($details->{movcol}->{title});
,但这给了我错误Not a HASH reference
答案 0 :(得分:2)
$details->{movcol}
是一个数组引用。取消引用数组以获取标题:
for (@{$details->{movcol}}) {
print "$_->{title}[0]\n";
}
或者,只打印第一个标题:
print "$details->{movcol}[0]{title}[0]\n";
答案 1 :(得分:1)
转储中的方括号表示数组,而大括号表示哈希。因此,您可以看到$details->{movcol}
是一个哈希数组,每个哈希都有一个带有键title
的元素和一个另一个数组的值。
您可以打印这样的标题
my $movcol = $details->{movcol};
for my $item ( @$movcol ) {
print $item->{title}[0], "\n";
}
或者您可以使用
创建标题字符串数组my @titles = map $_->{title}[0], @{ $details->{movcol} };