以下是手头的例子:
#!/usr/bin/perl
use strict;
use Web::Scraper;
use Data::Dumper;
my $html = q[
<html>
<body>
<div class="mainContainer">
<div class="when">February 20, 2014</div>
<div class="name">Name 1</div>
<div class="desc">Desc 1</div>
<div class="when">February 21, 2014</div>
<div class="name">Name 2</div>
<div class="desc">Desc 2</div>
<div class="name">Name 3</div>
<div class="desc">Desc 3</div>
<div class="when">February 22, 2014</div>
<div class="name">Name 4</div>
<div class="desc">Desc 4</div>
</div>
</body>
</html>
];
my $scraper = scraper {
process ".when", "events[]" => scraper {
my $when = $_->content();
my $hash = {};
$hash->{$when}->{name} = "NAME";
$hash->{$when}->{desc} = "DESC";
return $hash;
};
};
my $result = $scraper->scrape($html);
print Dumper( $result );
我要做的是将日期与事件详细信息相关联。正如你所看到的,div不是嵌套的,所以它不是那么简单(至少对我而言)。此外,每个活动都由name
和desc
组成。我没有找到一种方法来使用css选择器将所需结构中的相邻元素关联起来。我想我需要一个自定义子程序来返回做元素的关联。我想要检索的内容类似于以下内容:
[
'February 20, 2014' => [
{
'name' => 'Name 1',
'desc' => 'Desc 1'
}
],
'February 21, 2014' => [
{
'name' => 'Name 2',
'desc' => 'Desc 2'
},
{
'name' => 'Name 3',
'desc' => 'Desc 3'
}
],
'February 22, 2014' => [
{
'name' => 'Name 4',
'desc' => 'Desc 4'
}
]
]
答案 0 :(得分:0)
首先获取数据然后在刮刀之后处理这些数据可能会更好。所以...:
my $scraper = scraper {
process ".when", "dates[]" => "TEXT";
process ".name", "names[]" => "TEXT";
process ".desc", "desc[]" => "TEXT";
};
my $result = $scraper->scrape($html);
# Here you would start processing these
my @dates = @{ $result->{dates} };
my @names = @{ $result->{names} };
my @info = @{ $result->{desc} };
my %events;
for ( my $i = 0; $i < scalar @dates; $i++ ) {
my $date = $dates[$i];
my $name = $names[$i];
my $info = $info[$i];
if ( exists $events{$date} ) {
push @{ $events{$date} }, { 'name' => $name, 'desc' => $info };
}
else {
$events{$date} = [{ 'name' => $name, 'desc' => $info}];
}
}
%事件将拥有您需要的数据。这是假设您仍然需要这个,并且每个事件日期后面都有一个名称和描述。另外,我还没有测试过这个。