由于本网站的帮助,我在Perl方面取得了一些进展,但我遇到了问题。我正在抓取的其中一个页面已经改变,我现在无法弄清楚如何实现这一目标。我想要做的是存储我想要访问的每个页面的链接。问题是这些链接在源代码中的href属性标记内,我不知道如何提取它们。有谁可以帮助我?
我需要的链接来自此页面的第316到354行(源代码)http://www.soccerbase.com/teams/home.sd
我需要基本上提取变量的链接,以便在我的其他脚本中使用。如上所述,我正在使用WWW :: Mechanize和HTML :: TokeParser,希望我可以使用这些方法,但目前无法解决。提前谢谢!
答案 0 :(得分:0)
见method find_all_links
in WWW::Mechanize。无需使用解析器手动打扰。你可能想要放松正则表达式,这样你就可以同时获得所有~1000个团队。
use WWW::Mechanize qw();
my $w = WWW::Mechanize->new;
$w->get('http://www.soccerbase.com/teams/home.sd');
for my $link ($w->find_all_links(url_regex => qr/comp_id=1\b/)) {
# 20 instances of WWW::Mechanize::Link
printf "URL=%s\tTeam=%s\n", $link->url_abs, $link->text
}
URL=http://www.soccerbase.com/tournaments/tournament.sd?comp_id=1 Team=Premier League
URL=http://www.soccerbase.com/teams/team.sd?team_id=142&comp_id=1 Team=Arsenal
URL=http://www.soccerbase.com/teams/team.sd?team_id=154&comp_id=1 Team=Aston Villa
URL=http://www.soccerbase.com/teams/team.sd?team_id=308&comp_id=1 Team=Blackburn
URL=http://www.soccerbase.com/teams/team.sd?team_id=354&comp_id=1 Team=Bolton
URL=http://www.soccerbase.com/teams/team.sd?team_id=536&comp_id=1 Team=Chelsea
URL=http://www.soccerbase.com/teams/team.sd?team_id=942&comp_id=1 Team=Everton
URL=http://www.soccerbase.com/teams/team.sd?team_id=1055&comp_id=1 Team=Fulham
URL=http://www.soccerbase.com/teams/team.sd?team_id=1563&comp_id=1 Team=Liverpool
URL=http://www.soccerbase.com/teams/team.sd?team_id=1718&comp_id=1 Team=Man City
URL=http://www.soccerbase.com/teams/team.sd?team_id=1724&comp_id=1 Team=Man Utd
URL=http://www.soccerbase.com/teams/team.sd?team_id=1823&comp_id=1 Team=Newcastle
URL=http://www.soccerbase.com/teams/team.sd?team_id=1855&comp_id=1 Team=Norwich
URL=http://www.soccerbase.com/teams/team.sd?team_id=2093&comp_id=1 Team=QPR
URL=http://www.soccerbase.com/teams/team.sd?team_id=2477&comp_id=1 Team=Stoke
URL=http://www.soccerbase.com/teams/team.sd?team_id=2493&comp_id=1 Team=Sunderland
URL=http://www.soccerbase.com/teams/team.sd?team_id=2513&comp_id=1 Team=Swansea
URL=http://www.soccerbase.com/teams/team.sd?team_id=2590&comp_id=1 Team=Tottenham
URL=http://www.soccerbase.com/teams/team.sd?team_id=2744&comp_id=1 Team=West Brom
URL=http://www.soccerbase.com/teams/team.sd?team_id=2783&comp_id=1 Team=Wigan
URL=http://www.soccerbase.com/teams/team.sd?team_id=2848&comp_id=1 Team=Wolves