Question

我正在尝试使用perl Web::Scraper模块来抓取页面并处理各种元素。

我写了以下脚本：

use strict;
use warnings;
use Web::Scraper;
use Data::Dumper;
use URI;

my $purlToScrape='https://isohunt.to/torrents/?ihq=back+to+the+future&Torrent_sort=seeders.desc';
my $movcol = scraper {
    process "td.title-row", "movcol[]" => scraper {
        process "span", "title[]" => 'TEXT';
    process "a", "url[]" => '@href';
    };
};

my $details = $movcol->scrape(URI->new($purlToScrape));
print Dumper($details->{movcol});

输出：

$VAR1 = [
  {
    'url' => [
               bless( do{\(my $o = 'https://isohunt.to/torrent_details/5538709/Back-to-the-Future-III-1990-720p-BrRip-x264-700MB-YIFY')}, 'URI::https' ),
               bless( do{\(my $o = 'https://isohunt.to/torrents/?iht=5&age=0')}, 'URI::https' )
             ],
    'title' => [
                 'Back to the Future III (1990) 720p BrRip x264 - 700MB - YIFY'
               ]
  },
  {
    'url' => [
               bless( do{\(my $o = 'https://isohunt.to/torrent_details/6395538/Back-to-the-Future-1985-1080p')}, 'URI::https' ),
               bless( do{\(my $o = 'https://isohunt.to/torrents/?iht=5&age=0')}, 'URI::https' )
             ],
    'title' => [
                 'Back to the Future (1985) [1080p]'
               ]
  }
];

我尝试做的是处理每个标题元素。如何在代码中使用这些元素？

我尝试使用print Dumper($details->{movcol}->{title});，但这给了我错误Not a HASH reference

Answer 1

$details->{movcol}是一个数组引用。取消引用数组以获取标题：

for (@{$details->{movcol}}) {
    print "$_->{title}[0]\n";
}

或者，只打印第一个标题：

print "$details->{movcol}[0]{title}[0]\n";

Answer 2

转储中的方括号表示数组，而大括号表示哈希。因此，您可以看到$details->{movcol}是一个哈希数组，每个哈希都有一个带有键title的元素和一个另一个数组的值。

您可以打印这样的标题

my $movcol = $details->{movcol};

for my $item ( @$movcol ) {
    print $item->{title}[0], "\n";
}

或者您可以使用

创建标题字符串数组

my @titles = map $_->{title}[0], @{ $details->{movcol} };

如何正确使用哈希？

2 个答案: