无法使用特定网站的html dom解析器废弃内容

时间:2017-02-21 07:35:50

标签: php simple-html-dom

我一直在尝试从网站上删除内容,并在某些网站上取得了成功。但是我的代码无法从flipkart.com中删除内容。我使用HTML DOM PARSER,这是我的代码..

<?php
include ('simple_html_dom.php');
$scrap_url = 'https://www.flipkart.com/lenovo-f309-2-tb-external-hard-disk-drive/p/itmehwha6zkhkgfw';
$html = file_get_html($scrap_url);
foreach($html->find('h1._3eAQiD') as $title_s)
echo $title_s->plaintext;
foreach($html->find('div.hGSR34') as $ratings_s)
echo $ratings_s->plaintext;
?> 

此代码显示空结果。有人能让我知道问题是什么吗?是否还有其他方法可以从此网站中删除内容?

1 个答案:

答案 0 :(得分:0)

这段代码对我有用。

get_content_by_class(curl('https://www.flipkart.com/lenovo-f309-2-tb-external-hard-disk-drive/p/itmehwha6zkhkgfw'), "hGSR34");

function curl($url) {
    $ch = curl_init();  // Initialising cURL
    //curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT , 0);
    curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}

function get_content_by_class($html, $container_class_name) {

    //preg_match_all('/<div class="' . $container_class_name .'">(.*?)<\/div>/s', $html, $matches);
    preg_match_all('#<\s*?div class="'. $container_class_name . '\b[^>]*>(.*?)</div\b[^>]*>#s', $html, $matches);

    // 

    foreach($matches as $match){
        $match1 = str_replace('<','&lt',$match);
        $match2 = str_replace('>','&gt',$match1);
        print_r($match2);
    }  

    if (empty($matches)){
        echo 'no matches found';
        echo '</br>';
    }
    //return $matches;

}