如何获取flipkart产品规格和功能

时间:2016-04-06 18:45:01

标签: php regex web-scraping preg-match

我正在尝试从flipkart中获取产品规格和功能列表。我正在使用波纹管代码,但我无法获取规范。

是否有任何可以使我的代码完整的东西。

<?php
$url = 'http://www.flipkart.com/samsung-415-l-frost-free-double-door-refrigerator/p/itmedp6zcppxvhgh?pid=RFREDP6ZJXFY5QMK&al=Xh6p4IpEIjAx6PWgfu6yt8ldugMWZuE7%2BW7da8XnwKRuC2TkVUlPYWLhfoM4PDZcEqn50nOHN48%3D&ref=L%3A1683055601844045008&srno=p_2&query=samsung+rt42&otracker=from-search';
$response = getPriceFromFlipkart($url);
echo json_encode($response);
/* Returns the response in JSON format */
function getPriceFromFlipkart($url) {

$curl = curl_init($url);
curl_setopt($curl, CURLOPT_USERAGENT, "Chrome/49.0.2623.110 (Windows; U; Windows NT 10.10; labnol;) ctrlq.org");
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($curl);
curl_close($curl);


$regex = '/<meta itemprop="price" content="([^"]*)"/';
preg_match($regex, $html, $price);

$regex = '/<h1[^>]*>([^<]*)<\/h1>/';
preg_match($regex, $html, $title);


$regex = '/data-zoomimage="([^"]*)"/i';
preg_match($regex, $html, $image);


if ($price && $title && $image) {
    $response = array("price" => "Rs. $price[1].00", "image" => $image[1], "title" => $title[1], "status" => "200");

} else {
    $response = array("status" => "404", "error" => "We could not find the product details on Flipkart $url");
}

 return $response;
}
?>

1 个答案:

答案 0 :(得分:0)

没有神奇的正则表达式可以做到这一点。您可能需要混合使用多个正则表达式和代码来获取规范。

起点可能是使用<div[^>]*>\s*Specifications\s*<\/div>(.*?)<div[^>]*>\s*Questions\s*and\s*Answers

请参阅:https://regex101.com/r/Tanr0H/4

这将获得从“规范”到“问题与答案”的html响应。

请参阅懒惰与贪婪,以了解(.*?)的工作原理:What do 'lazy' and 'greedy' mean in the context of regular expressions?

另外,使用一些库来解析html可能是个好主意

How do you parse and process HTML/XML in PHP?