使用php提取html内容

时间:2013-08-14 02:19:00

标签: php dom domdocument file-get-contents domxpath

我有以下代码:

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html");

$dom = new DOMDocument();


$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//*[@id="price_div"]/div[2]/span[2]');  //this catches all elements with 
var_dump($nodes); 

我想从页面中提取价格。但是这个xpath没有给我结果。

1 个答案:

答案 0 :(得分:0)

你有没有解决过这个问题?这是一些有效的代码:

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html");

//suppress errors (there is a lot on the page in question)
libxml_use_internal_errors(true);

//dont preserve whitespaces
$page->preserveWhiteSpace = false;

$dom = new DOMDocument();
//as @Larry.Z comments, you forgot to load the $html
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

//assuming there can be more than one "price set" on each page
$prices = array();

$price_divs = $xpath->query('//div[@id="price_div"]');
foreach ($price_divs as $price_div) {
    $price=array();
    foreach ($price_div->childNodes as $price_item) {
        $content=trim($price_item->textContent);
        if ($content!='') $price[]=$content;
    } 
    $prices[]=$price;
}

echo '<pre>';
print_r($prices);
echo '</pre>';

输出

Array
(
    [0] => Array
        (
            [0] => Save 66%
            [1] => Rs. 5850
            [2] => Rs. 1999
        )

)

如果每页的价格设置不超过一个,则可以跳过$prices[]部分并仅使用$price