我一直在尝试从网站上删除内容,并在某些网站上取得了成功。但是我的代码无法从flipkart.com中删除内容。我使用HTML DOM PARSER,这是我的代码..
<?php
include ('simple_html_dom.php');
$scrap_url = 'https://www.flipkart.com/lenovo-f309-2-tb-external-hard-disk-drive/p/itmehwha6zkhkgfw';
$html = file_get_html($scrap_url);
foreach($html->find('h1._3eAQiD') as $title_s)
echo $title_s->plaintext;
foreach($html->find('div.hGSR34') as $ratings_s)
echo $ratings_s->plaintext;
?>
此代码显示空结果。有人能让我知道问题是什么吗?是否还有其他方法可以从此网站中删除内容?
答案 0 :(得分:0)
这段代码对我有用。
get_content_by_class(curl('https://www.flipkart.com/lenovo-f309-2-tb-external-hard-disk-drive/p/itmehwha6zkhkgfw'), "hGSR34");
function curl($url) {
$ch = curl_init(); // Initialising cURL
//curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT , 0);
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function get_content_by_class($html, $container_class_name) {
//preg_match_all('/<div class="' . $container_class_name .'">(.*?)<\/div>/s', $html, $matches);
preg_match_all('#<\s*?div class="'. $container_class_name . '\b[^>]*>(.*?)</div\b[^>]*>#s', $html, $matches);
//
foreach($matches as $match){
$match1 = str_replace('<','<',$match);
$match2 = str_replace('>','>',$match1);
print_r($match2);
}
if (empty($matches)){
echo 'no matches found';
echo '</br>';
}
//return $matches;
}