Question

我正在使用simpleHtmlDom做一些基本的屏幕抓取。我在抓住产品价格方面遇到了一些问题。有时我可以让它工作，有时我不能。此外，有时我会得到多个价格......例如，网站上有类似“通常100美元......现在79.99美元”的内容有什么建议吗？目前，我正在使用它：

$prices = array();
$prices[] = $html->find("[class*=price]", 0)->innertext;
$prices[] = $html->find("[class*=msrp]", 0)->innertext;
$prices[] = $html->find("[id*=price]", 0)->innertext;
$prices[] = $html->find("[id*=msrp]", 0)->innertext;
$prices[] = $html->find("[name*=price]", 0)->innertext;
$prices[] = $html->find("[name*=msrp]", 0)->innertext;

我不知道如何从价格中获取价格的一个网站是维多利亚秘密....价格看起来就像随机HTML一样。

Answer 1

首先，不要使用simplehtmldom。使用内置的dom函数或基于它们的库。如果你想从页面中提取所有价格，你可以尝试这样的事情：

$html = "<html><body>normally $100... now $79.99</body></html>";
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DomXpath($dom);

foreach($xpath->query('//text()[contains(.,"$")]') as $node){
    preg_match_all('/(\$[\d,.]+)/', $node->nodeValue, $m);
    print_r($m);
}

php dom scraping - 获取产品价格的最佳方法

1 个答案: