我正在编写一个脚本来重新排列HTML内容,我遇到了两个问题。我有这个html结构,这是电影标题和发行年份,缩略图分为5列。我希望生成新的html文件,其中包含从2011年到1911年分组数十年的电影,例如呈现-2011; 2010至01年; 2000年至1991年;等
<table>
<tr>
<td class="basic" valign="top">
<a href="details/267226.html" title="" id="thumbimage">
<img src="images/267226f.jpg"/>
</a>
<br/>Cowboys & Aliens  (2011)
</td>
<td class="basic" valign="top">
<a href="details/267185.html" title="" id="thumbimage">
<img src="images/267185f.jpg"/>
</a>
<br/>The Hangover Part II  (2011)
</td>
<td class="basic" valign="top">
<a href="details/267138.html" title="" id="thumbimage">
<img src="images/267138f.jpg"/>
</a>
<br/>Friends With Benefits  (2011)
</td>
<td class="basic" valign="top">
<a href="details/266870.html" title="" id="thumbimage">
<img src="images/266870f.jpg"/>
</a>
<br/>Beauty And The Beast  (1991)
</td>
<td class="basic" valign="top">
<a href="details/266846.html" title="" id="thumbimage">
<img src="images/266846f.jpg"/>
</a>
<br/>The Fox And The Hound  (1981)
</td>
</tr>
......
</table>
我不知道如何解决的一个问题是,在删除与十年不匹配的电影后,我留下了空的'tr'标签和缩略图位置,并且不知道如何重新排列5列填充的每一行有5个冠军。以及如何通过一次调用脚本来处理每个十年。感谢。
use autodie;
use strict;
use warnings;
use File::Slurp;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_file( 'test.html' );
for my $h ( $tree->look_down( class => 'basic' ) ) {
edit_links( $h );
my ($year) = ($h->as_text =~ /.*?\((\d+)\).*/);
if ($year > 2010 or $year < 2001) {
$h->detach;
write_file( "decades/2010-2001.html", \$tree->as_HTML('<>&',' ',{}), "\n" );
}
}
sub edit_links {
my $h = shift;
for my $link ( $h->find_by_tag_name( 'a' ) ) {
my $href = '../'.$link->attr( 'href' );
$link->attr( 'href', $href );
}
for my $link ( $h->find_by_tag_name( 'img' ) ) {
my $src = '../'.$link->attr( 'src' );
$link->attr( 'src', $src );
}
}
答案 0 :(得分:0)
下面的方法应该做你想要的问题。在HTML文件处理期间,设置了散列%decade
,每个键都以十年为结尾,并且值为适当单元格的arrayref。
第二个循环遍历散列并输出每十年的文件,使用<tr>
标记围绕每个5个单元格。
use strict;
use HTML::TreeBuilder;
use File::Slurp;
use List::MoreUtils qw(part);
my $tree = HTML::TreeBuilder->new_from_file('test.html');
my %decade = ();
for my $h ( $tree->look_down( class => 'basic' ) ) {
edit_links( $h );
my ($year) = ($h->as_text =~ /.*?\((\d+)\).*/);
my $dec = (int($year/10) + 1) * 10;
$decade{$dec} ||= [];
push @{$decade{$dec}}, $h;
}
for my $dec (sort { $b <=> $a } keys %decade) {
my $filename = "decades/" . $dec . "-" . ($dec - 9) . ".html";
my $idx = 0;
my @items = map { $_->as_HTML('<>&',' ',{}) } @{ $decade{$dec} };
my $contents = join('',
'<table>',
(map { "<tr>@$_</tr>" } part { int($idx++ / 5) } @items),
'</table>');
write_file( $filename, $contents);
}
...