simple_html_dom:试图在谷歌搜索中找到高度

时间:2015-04-23 20:02:58

标签: php web-scraping simple-html-dom scrape

任何人都可以向我解释代码有什么问题,我如何获得高度值?我想要获得名人的高度。有什么建议吗?

感谢。

我的代码(根据建议更新了CURL用户代理设置):

$url='https://www.google.com/webhp?ie=UTF-8#q=ailee+height';

//Set CURL user agent
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);

$data = curl_exec($ch);
curl_close($ch);

//simple html dom
require_once('lib/simple_html_dom.php');
$html = str_get_html($data);
$height= $html->find('div[class="_eF"]',0)->innertext;
echo $height;

我从上面的代码中清空了。在这种情况下,我想返回:

5' 5" (1.65 m)

1 个答案:

答案 0 :(得分:1)

问题是curl没有处理JavaScript,而Google会在禁用JavaScript时显示不同的网页,在这种情况下,div会发生变化到具有不同span

id
<span class="_m3b">1.65 m</span>

此外,您使用的链接并不适合我。

请改为尝试:

<?php
header('Content-Type: text/html; charset=utf-8');
$url='https://www.google.pt/search?q=ailee+height&num=10&gbv=1';

//Set CURL user agent
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);

$data = curl_exec($ch);
curl_close($ch);

require_once('simple_html_dom.php');
$html = str_get_html($data);
$height= $html->find('span[class="_m3b"]',0)->innertext;
echo $height;
//1.65 m