如何使用PHP DOMDocument解析HTML?

时间:2013-01-10 21:19:10

标签: php html parsing domdocument

我在这里有一个HTML块:

<div class="title">
    <a href="http://test.com/asus_rt-n53/p195257/">
        Asus RT-N53
    </a>
</div>
<table>
    <tbody>
        <tr>
            <td class="price-status">
                <div class="status">
                    <span class="available">Yes</span>
                </div>
                <div name="price" class="price">
                    <div class="uah">758<span> ua.</span></div>
                    <div class="usd">$&nbsp;62</div>
                </div>

如何解析链接(http://test.com/asus_rt-n53/p195257/),标题(Asus RT-N53)和价格(758)?

这里的卷曲代码:

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$models = $xpath->query('//div[@class="title"]/a');
foreach ($models as $model) {
    echo $model->nodeValue;
    $prices = $xpath->query('//div[@class="uah"]');
    foreach ($prices as $price) {
        echo $price->nodeValue;
    }
}

1 个答案:

答案 0 :(得分:0)

一个丑陋的解决方案是将价格结果转换为仅保留数字:

echo (int) $price->nodeValue;

或者,您可以查询以查找div中的范围,并将其从价格中删除(在价格范围内):

$span = $xpath->query('//div[@class="uah"]/span')->item(0);
$price->removeChild($span);
echo $price->nodeValue;

修改

要检索链接,只需使用getAttribute()并获取href一个:

$model->getAttribute('href')