如何检索a.page_arrows的最后一次出现
<div class="page-nav">
<a class="paginationNumberStyle page_arrows" data-url="/Building-Materials-Concrete-Cement-Masonry/h_d1/N-5yc1vZ25ecodZarlk/h_d2/Navigation?catalogId=10053&Nu=P_PARENT_ID&langId=-1&Nao=384&storeId=10051">
<img alt="" src="/static/images/layout/triangle-green-left.gif"></a>
<span>6</span>
<a class="paginationNumberStyle" data-url="/Building-Materials-Concrete-Cement-Masonry/h_d1/N-5yc1vZ25ecodZarlk/h_d2/Navigation?catalogId=10053&Nu=P_PARENT_ID&langId=-1&Nao=576&storeId=10051">7</a>
<a class="paginationNumberStyle" data-url="/Building-Materials-Concrete-Cement-Masonry/h_d1/N-5yc1vZ25ecodZarlk/h_d2/Navigation?catalogId=10053&Nu=P_PARENT_ID&langId=-1&Nao=672&storeId=10051">8</a>
<a class="paginationNumberStyle page_arrows" data-url="/Building-Materials-Concrete-Cement-Masonry/h_d1/N-5yc1vZ25ecodZarlk/h_d2/Navigation?catalogId=10053&Nu=P_PARENT_ID&langId=-1&Nao=576&storeId=10051">
<img alt="" src="/static/images/layout/triangle-green-right.gif"></a>
</div>
我正在尝试收集链接,然后转到下一页并收集其余链接,直到没有嵌套页面。这是我的代码:
getLinks('http://www.homedepot.com/Building-Materials-Concrete-Cement-Masonry/h_d1/N-5yc1vZ25ecodZarlk/h_d2/Navigation?catalogId=10053&Nu=P_PARENT_ID&langId=-1&storeId=10051¤tPLP=true&omni=c_Concrete,%20Cement%20&%20Masonry&searchNav=true');
function getLinks($URL) {
$html = file_get_contents($URL);
$dom = new simple_html_dom();
$dom -> load($html);
foreach ($dom->find('a[class=item_description]') as $href){
$url = $href->href;
echo $url.'<br>';
}
if ($nextPage = $dom->find("a[class=paginationNumberStyle]" ,0)){
$nextPageURL = 'http://www.homedepot.com'.$nextPage->getAttribute('data-url');
$dom -> clear();
unset($dom);
getLinks($nextPageURL);
} else {
echo "\nEND";
$dom -> clear();
unset($dom);
}
}
答案 0 :(得分:1)
我遇到了同样的问题,并使用了children方法来抓取第一级项目。
<ul class="my-list">
<li>
<a href="#">Some Text</a>
<ul>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
</ul>
</li>
<li>
<a href="#">Some Text</a>
<ul>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
</ul>
</li>
</ul>
这里是Simple HTML Dom代码,只获得第一级li项:
$html = file_get_html( $url );
$first_level_items = $html->find( '.my-list', 0)->children();
foreach ( $first_level_items as $item ) {
... do stuff ...
}