使用DomDocuments

时间:2017-06-22 18:07:32

标签: php web-scraping domdocument

我尝试使用php的domDocument从url中提取href,例如url是:

trovaprezzi.it/categoria.aspx?id=-1&libera=frigorifero+lg

我想要提取的网址是' Frigoriferi e Congelatori'中的网址,这是我的代码草图:

我应该提取这个链接:' trovaprezzi.it/prezzo_frigoriferi-congelatori_frigorifero_lg.aspx' ;;来自源代码$ url,但链接更改,例如在此页面中#trovaprezzi.it/categoria.aspx?id = -1& libera = lavatrice + lg&#39 ;;我需要提取第一个链接:' trovaprezzi.it/prezzo_lavatrici-asciugatrici_lavatrice_lg.aspx' ;;

$url = 'http://www.trovaprezzi.it/categoria.aspx?id=-1&libera=frigorifero+lg';
$html = file_get_contents($url);
$dom = new DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('/html/body/div[@class="catsMI"]/div')->getElementsByTagName('a')->item(0)->getAttribute('href')  ;
echo $nodes;

提前感谢您的帮助

更新23/06

我要提取链接的代码示例:

<div class="catsMI">
        <div><a title="confronta i prezzi Frigoriferi e Congelatori" href="/prezzo_frigoriferi-congelatori_frigorifero_lg.aspx">Frigoriferi e Congelatori</a><span>(732 prezzi)</span></div>
        <div><a title="confronta i prezzi Ricambi Elettrodomestici" href="/prezzo_ricambi-elettrodomestici_frigorifero_lg.aspx">Ricambi Elettrodomestici</a><span>(191 prezzi)</span></div>
</div>

我想要这个网址:

/prezzo_frigoriferi-congelatori_frigorifero_lg.aspx

1 个答案:

答案 0 :(得分:0)

DOMXPath::query()会返回DOMNodeList,其中没有方法getElementsByTagName()。只有DOMDocumentDOMElement才有该方法。

我没有查看您提供的页面(请在您的问题中添加网站HTML的最小可行样本),但请尝试以下操作:

// search for all <a> elements that have a href attribute
// which are descendants of //div[@class="catsMI"]/div
$nodes = $xpath->query( '//div[@class="catsMI"]/div//a[@href]' );

// check if we found any nodes...
if( $nodes->length > 0 ) {
   // if we did: get href attribute of the first node we found
   $href = $nodes->item( 0 )->getAttribute( 'href' );
   echo $href;
}