如何只获得一些href属性

时间:2016-02-25 16:23:37

标签: php

我有这个PHP代码,我试图提取一些信息,但我停止了href步骤:

$site = "http://www.sports-reference.com/olympics/countries";
$site_html = file_get_html($site);

$country_dirty = $site_html->getElementById('div_countries');

        foreach($country_dirty->find('img') as $link){

            $country = $link->alt;
            $link_country = "$site/$country";
            $link_country_html = file_get_html($link_country);

            $link_season = $link_country_html->getElementById('div_medals');

                foreach($link_season->find('a') as $season){


                    echo $link_year_season = $season->href . "\n";

                    //echo $link_season = strstr ($link_year_season,'summer') . "\n";

                }
            }

变量$ link_year_season获取以下输出:

/olympics/countries/AFG/summer/2012/
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html
/olympics/athletes/ni/rohullah-nikpai-1.html
/olympics/countries/AFG/summer/2008/
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html
/olympics/athletes/ni/rohullah-nikpai-1.html
/olympics/countries/AFG/summer/2004/
/olympics/countries/AFG/summer/1996/
/olympics/countries/AFG/summer/1988/
/olympics/countries/AFG/summer/1980/
/olympics/countries/AFG/summer/1972/
.....

我想知道是否可以只获得此输出:

/olympics/countries/AFG/summer/2012/
/olympics/countries/AFG/summer/2008/
/olympics/countries/AFG/summer/2004/
/olympics/countries/AFG/summer/1996/
/olympics/countries/AFG/summer/1988/
/olympics/countries/AFG/summer/1980/
/olympics/countries/AFG/summer/1972/

1 个答案:

答案 0 :(得分:0)

您应该能够使用此正则表达式来检查该链接是以/olympics/countries/AFG/summer/开头,然后是数字和/

foreach($link_season->find('a') as $season){
    if(preg_match('~^/olympics/countries/AFG/summer/\d+/~', $season->href)) {
         echo $link_year_season = $season->href . "\n";
         //echo $link_season = strstr ($link_year_season,'summer') . "\n";
    }
}

演示:https://regex101.com/r/bZ1vP3/1

你也可以通过在夏天之后捕获数字来拉动当前年份(假设是一年,第一个正则表达式只检查数字这个更严格)..

foreach($link_season->find('a') as $season){
        if(preg_match('~^/olympics/countries/AFG/summer/(\d{4})/~', $season->href, $year)) {
             echo $link_year_season = $season->href . "\n";
             //echo $link_season = strstr ($link_year_season,'summer') . "\n";
             echo 'The year is ' . $year[1] . "\n";
        }
}

如果季节也有所不同,您可以执行(?:summer|winter),这样summerwinter就可以成为第四个目录。