无法在查询时刮取Booking的结果

时间:2014-07-25 17:38:10

标签: php curl web-scraping

我是使用PHP进行Web抓取的新手,但不是PHP本身。我的问题不是正则表达式相关,但似乎直接与booking.com网站有关。我想在特定城市刮取酒店的价格。为此,我在预订页面中复制了浏览器中的URL,并将其粘贴到我的代码中。

This是页面。

这是我的代码:

    <?php

function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       return @curl_exec($ch);
}

$html=getHTML("http://www.booking.com/searchresults.en.html?dcid=1;checkin_monthday=25;checkin_year_month=2014-7;checkout_monthday=26;checkout_year_month=2014-7;city=-1461464;class_interval=1;csflt=%7B%7D;interval_of_time=undef;no_rooms=1;or_radius=0;property_room_info=1;review_score_group=empty;score_min=0;src=city;ssb=empty;;nflt=ht_id%3D204%3Bclass%3D3%3B;unchecked_filter=class",10);

echo $html;

?>

我确实打印了预订页面但是它没有考虑到URL中的参数,因为在页面上我得到它要求预订日期&amp;城市...

我尝试在多个浏览器中粘贴此网址并隐身窗口(以查看该网址是否与Cookie或其他内容相关联),并且工作正常。也许我在cURL请求中错过了一个参数......

1 个答案:

答案 0 :(得分:5)

我终于解决了这个问题。我只需要将浏览器的标题发送到网站。 这是代码:

function getHTML($url,$timeout)
{
        $header = array();
        $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
        $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
        $header[] =  "Cache-Control: max-age=0";
        $header[] =  "Connection: keep-alive";
        $header[] = "Keep-Alive: 300";
        $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
        $header[] = "Accept-Language: en-us,en;q=0.5";
        $header[] = "Pragma: "; // browsers keep this blank.

       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       curl_setopt($ch, CURLOPT_COOKIESESSION, true );
       return @curl_exec($ch);
}

$url = "booking url...";
echo getHTML($url,10);

解决方案很简单......