我尝试使用php的domDocument从url中提取href,例如url是:
trovaprezzi.it/categoria.aspx?id=-1&libera=frigorifero+lg
我想要提取的网址是' Frigoriferi e Congelatori'中的网址,这是我的代码草图:
我应该提取这个链接:' trovaprezzi.it/prezzo_frigoriferi-congelatori_frigorifero_lg.aspx' ;;来自源代码$ url,但链接更改,例如在此页面中#trovaprezzi.it/categoria.aspx?id = -1& libera = lavatrice + lg&#39 ;;我需要提取第一个链接:' trovaprezzi.it/prezzo_lavatrici-asciugatrici_lavatrice_lg.aspx' ;;
$url = 'http://www.trovaprezzi.it/categoria.aspx?id=-1&libera=frigorifero+lg';
$html = file_get_contents($url);
$dom = new DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('/html/body/div[@class="catsMI"]/div')->getElementsByTagName('a')->item(0)->getAttribute('href') ;
echo $nodes;
提前感谢您的帮助
更新23/06
我要提取链接的代码示例:
<div class="catsMI">
<div><a title="confronta i prezzi Frigoriferi e Congelatori" href="/prezzo_frigoriferi-congelatori_frigorifero_lg.aspx">Frigoriferi e Congelatori</a><span>(732 prezzi)</span></div>
<div><a title="confronta i prezzi Ricambi Elettrodomestici" href="/prezzo_ricambi-elettrodomestici_frigorifero_lg.aspx">Ricambi Elettrodomestici</a><span>(191 prezzi)</span></div>
</div>
我想要这个网址:
/prezzo_frigoriferi-congelatori_frigorifero_lg.aspx
答案 0 :(得分:0)
DOMXPath::query()
会返回DOMNodeList
,其中没有方法getElementsByTagName()
。只有DOMDocument
和DOMElement
才有该方法。
我没有查看您提供的页面(请在您的问题中添加网站HTML的最小可行样本),但请尝试以下操作:
// search for all <a> elements that have a href attribute
// which are descendants of //div[@class="catsMI"]/div
$nodes = $xpath->query( '//div[@class="catsMI"]/div//a[@href]' );
// check if we found any nodes...
if( $nodes->length > 0 ) {
// if we did: get href attribute of the first node we found
$href = $nodes->item( 0 )->getAttribute( 'href' );
echo $href;
}