使用PHP和cURL刮取div内容

时间:2013-07-25 12:30:33

标签: php regex curl

我是cURL的新手。 我一直试图将this amazon link的内容(即图像,书名,作者和20本书的价格)写入html页面。到目前为止,我已经使用以下代码打印页面

<?php
function curl($url) {
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_AUTOREFERER => TRUE,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT => 120,
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_URL => $url,
    );

    $ch = curl_init();
    curl_setopt_array($ch, $options);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
?>

$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;

我尝试过使用正则表达式并失败了;我已经尝试了一切可能连续6小时,并且非常累,希望我能在这里找到解决方案;只是感谢这个解决方案还不够,但要事先提前。 :)

更新:为像我这样的初学者找到了一个非常有用的网站(click here)(不使用cURL)。

1 个答案:

答案 0 :(得分:1)

你真的应该使用AWSECommerce API,但这是一种利用雅虎YQL服务的方法:

<?php
$query = sprintf(
    'http://query.yahooapis.com/v1/public/yql?q=%s',
    urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[@class="zg_itemImmersion"]\'')
);

$xml = new SimpleXMLElement($query, null, true);

foreach ($xml->results->div as $product) {
    vprintf("%s\n", array(
        $product->div[1]->div[1]->a,
    ));
}

/*
    Engineering Thermodynamics
    A Textbook of Fluids Mechanics
    The Design of Everyday Things
    A Forest History of India
    Computer Networking
    The Story of Microsoft
    Private Empire: ExxonMobil and Americ...
    Project Management Metrics, KPIs, and...
    Design and Analysis of Experiments: I...
    IES - 2013: General English
    Foundation of Software Testing: ISTQB...
    Faster: 100 Ways to Improve your Digi...
    A Textbook of Fluid Mechanics and Hyd...
    Software Engineering for Embedded Sys...
    Communication Skills for Engineers
    Making Things Move DIY Mechanisms for...
    Virtual Instrumentation Using Labview
    Geometric Dimensioning and Tolerancin...
    Power System Protection & Switchgear...
    Computer Networks
*/