我有这个PHP代码,我试图提取一些信息,但我停止了href步骤:
$site = "http://www.sports-reference.com/olympics/countries";
$site_html = file_get_html($site);
$country_dirty = $site_html->getElementById('div_countries');
foreach($country_dirty->find('img') as $link){
$country = $link->alt;
$link_country = "$site/$country";
$link_country_html = file_get_html($link_country);
$link_season = $link_country_html->getElementById('div_medals');
foreach($link_season->find('a') as $season){
echo $link_year_season = $season->href . "\n";
//echo $link_season = strstr ($link_year_season,'summer') . "\n";
}
}
变量$ link_year_season获取以下输出:
/olympics/countries/AFG/summer/2012/
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html
/olympics/athletes/ni/rohullah-nikpai-1.html
/olympics/countries/AFG/summer/2008/
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html
/olympics/athletes/ni/rohullah-nikpai-1.html
/olympics/countries/AFG/summer/2004/
/olympics/countries/AFG/summer/1996/
/olympics/countries/AFG/summer/1988/
/olympics/countries/AFG/summer/1980/
/olympics/countries/AFG/summer/1972/
.....
我想知道是否可以只获得此输出:
/olympics/countries/AFG/summer/2012/
/olympics/countries/AFG/summer/2008/
/olympics/countries/AFG/summer/2004/
/olympics/countries/AFG/summer/1996/
/olympics/countries/AFG/summer/1988/
/olympics/countries/AFG/summer/1980/
/olympics/countries/AFG/summer/1972/
答案 0 :(得分:0)
您应该能够使用此正则表达式来检查该链接是以/olympics/countries/AFG/summer/
开头,然后是数字和/
。
foreach($link_season->find('a') as $season){
if(preg_match('~^/olympics/countries/AFG/summer/\d+/~', $season->href)) {
echo $link_year_season = $season->href . "\n";
//echo $link_season = strstr ($link_year_season,'summer') . "\n";
}
}
演示:https://regex101.com/r/bZ1vP3/1
你也可以通过在夏天之后捕获数字来拉动当前年份(假设是一年,第一个正则表达式只检查数字这个更严格)..
foreach($link_season->find('a') as $season){
if(preg_match('~^/olympics/countries/AFG/summer/(\d{4})/~', $season->href, $year)) {
echo $link_year_season = $season->href . "\n";
//echo $link_season = strstr ($link_year_season,'summer') . "\n";
echo 'The year is ' . $year[1] . "\n";
}
}
如果季节也有所不同,您可以执行(?:summer|winter)
,这样summer
或winter
就可以成为第四个目录。