使用Curl和Php复制来自Flipkart.com网页抓取的特定div

时间:2016-03-26 12:15:18

标签: php curl web-scraping

我想复制特定div包含来自flipkart产品网页的数据并显示它。

<table cellspacing="0" class="specTable">
///// contains  /////
</table>

它的表值是变量的,某些网页在同一个类中有10个表,有些页面有更多,我怎么能从中获取所有表值?

还想获得具体的specsValue,是否也可以获得它?

<td class="specsKey">Brand</td><td class="specsValue">Apple</td>

网页地址:http://www.flipkart.com/apple-iphone-6/p/itme8ra5z7yx5c9j?pid=MOBEYHZ2JHVFHFBG

示例代码     

$url = "http://dl.flipkart.com/dl/apple-iphone-6/p/itme8ra5z7yx5c9j?pid=MOBEYHZ2JHVFHFBG";

$response = getPriceFromFlipkart($url);

echo json_encode($response);

/* Returns the response in JSON format */

function getPriceFromFlipkart($url) {

$curl = curl_init($url);

curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 10.10; labnol;) ctrlq.org");
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($curl);
curl_close($curl);

$regex = '/<meta itemprop="price" content="([^"]*)"/';
preg_match($regex, $html, $price);

$regex = '/<h1[^>]*>([^<]*)<\/h1>/';
preg_match($regex, $html, $title);

$regex = '/data-src="([^"]*)"/i';
preg_match($regex, $html, $image);

if ($price && $title && $image) {

    $response = array("price" => $price[1], "title" => $title[1], "image" => $image[1]);

} else {

    $response = array("status" => "404", "error" => "We could not find the product details on Flipkart $url");

}

return $response;
}

?>

1 个答案:

答案 0 :(得分:0)

Flipkart现在更改其界面,您可以使用Flipkart API获取产品价格。 目前我也在使用他们的API。

但是我也希望使用下面的curl命令来获取产品详细信息,如果有人这样做没有任何问题请分享我在这里添加的其他内容以获取产品网页内容,同时使用getinfo()进行调试它将使用301 Moved Permanently

返回Status Code 0
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,<flipkart_url>);
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,100);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
curl_setopt($curl_handle, CURLOPT_REFERER, 'http://www.flipkart.com/');
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');

$str = curl_exec($curl_handle);         
$html = new simple_html_dom();          
$html->load($str);