我有这个PHP代码:
$main_url = "http://www.sports-reference.com/olympics/countries/DEN/summer/1896/";
$main_html=file_get_html($main_url);
$link = $main_html->getElementById('div_sports');
foreach ($link->find('td') as $element){
foreach($element->find('href') as $node){
echo $node->item(0)->nodeValue . "\n";
//$link_clean = $node->getAttribute('href');
echo $link_clean . "\n";
}
}
如果我打印出$ element,我会得到这个输出:
<td align="left" ><a href="/olympics/countries/DEN/summer/1896/ATH/">Athletics</a></td>
<td align="left" ><a href="/olympics/countries/DEN/summer/1896/FEN/">Fencing</a></td>
<td align="left" ><a href="/olympics/countries/DEN/summer/1896/GYM/">Gymnastics</a></td>
<td align="left" ><a href="/olympics/countries/DEN/summer/1896/SHO/">Shooting</a></td>
<td align="left" ><a href="/olympics/countries/DEN/summer/1896/WLT/">Weightlifting</a></td>
我需要提取此信息:
/奥运会/国家/ DEN /夏季/ 1896 / ATH / /奥运/国家/ DEN /夏/ 1896年/分/ ..........
等等。上面的代码不起作用。你帮忙吗?
答案 0 :(得分:2)
href
不是标记,而是标记属性。
因此,您必须搜索<a>
:
foreach( $element->find('a') as $a)
{
echo $a->href . "\n";
(...)
}