我正在尝试从flipkart中获取产品规格和功能列表。我正在使用波纹管代码,但我无法获取规范。
是否有任何可以使我的代码完整的东西。
<?php
$url = 'http://www.flipkart.com/samsung-415-l-frost-free-double-door-refrigerator/p/itmedp6zcppxvhgh?pid=RFREDP6ZJXFY5QMK&al=Xh6p4IpEIjAx6PWgfu6yt8ldugMWZuE7%2BW7da8XnwKRuC2TkVUlPYWLhfoM4PDZcEqn50nOHN48%3D&ref=L%3A1683055601844045008&srno=p_2&query=samsung+rt42&otracker=from-search';
$response = getPriceFromFlipkart($url);
echo json_encode($response);
/* Returns the response in JSON format */
function getPriceFromFlipkart($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_USERAGENT, "Chrome/49.0.2623.110 (Windows; U; Windows NT 10.10; labnol;) ctrlq.org");
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($curl);
curl_close($curl);
$regex = '/<meta itemprop="price" content="([^"]*)"/';
preg_match($regex, $html, $price);
$regex = '/<h1[^>]*>([^<]*)<\/h1>/';
preg_match($regex, $html, $title);
$regex = '/data-zoomimage="([^"]*)"/i';
preg_match($regex, $html, $image);
if ($price && $title && $image) {
$response = array("price" => "Rs. $price[1].00", "image" => $image[1], "title" => $title[1], "status" => "200");
} else {
$response = array("status" => "404", "error" => "We could not find the product details on Flipkart $url");
}
return $response;
}
?>
答案 0 :(得分:0)
没有神奇的正则表达式可以做到这一点。您可能需要混合使用多个正则表达式和代码来获取规范。
起点可能是使用<div[^>]*>\s*Specifications\s*<\/div>(.*?)<div[^>]*>\s*Questions\s*and\s*Answers
请参阅:https://regex101.com/r/Tanr0H/4
这将获得从“规范”到“问题与答案”的html响应。
请参阅懒惰与贪婪,以了解(.*?)
的工作原理:What do 'lazy' and 'greedy' mean in the context of regular expressions?
另外,使用一些库来解析html可能是个好主意