我想知道Groupon的活跃交易,所以我写了一个刮板,像:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
@$dom->loadHTMLFile('https://www.groupon.com/browse/new-york?category=food-and-drink&minPrice=1&maxPrice=999');
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//li[@class='slot']//a/@href");
foreach($entries as $e) {
echo $e->textContent . '<br />';
}
但是当我运行此功能时,浏览器一直加载,只是加载了一些东西,但没有显示任何错误。
我该如何解决?不只是Groupon的案例-我也尝试其他网站,但也无法正常工作。为什么?
答案 0 :(得分:0)
如何使用CURL加载页面数据。
Not just case with Groupon - I also try other websites but also don't work
我认为这段代码将为您提供帮助,但是您应该为每个要剪贴的网站带来意外情况。
<?php
$dom = new DOMDocument();
$data = get_url_content('https://www.groupon.com', true);
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//label");
foreach($entries as $e) {
echo $e->textContent . '<br />';
}
function get_url_content($url = null, $justBody = true)
{
/* Init CURL */
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_HTTPHEADER, []);
$data = curl_exec($ch);
if ($justBody)
$data = @(explode("\r\n\r\n", $data, 2))[1];
var_dump($data);
return $data;
}