使用curl php抓取网站时的问题

时间:2018-04-16 21:06:36

标签: php curl web-scraping

我想从网站TripAdvisor获取Al Riyad城市的所有酒店名称,这是我的代码:

require_once(APPPATH."../simple_html_dom.php");

      $postfields = array(
          "sl_opp_json" => "%7B%22HOTELS_AB_SLOT_0%22%3A%22eb6ce073-5ea4-4182-ab18-9d656fe9bbc5%22%2C%22HOTELS_SLOT_0%22%3A%225c1f95f2-50aa-4c1b-89ea-f957ea5607fd%22%7D",
          "plSeed" => "845179989",
          "showSnippets" =>"false",
          "offset" =>"120",
          "reqNum" =>"2",
          "changeSet" =>"",
          "puid" =>"WtT-4QoQJX8AAmsw3QgAAABC"
      );
      $ch = curl_init();
      curl_setopt($ch,CURLOPT_URL,"https://www.tripadvisor.fr/Hotels-g293995-Riyadh_Riyadh_Province-Hotels.html");
      curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
      curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
      curl_setopt($ch,CURLOPT_POST,1);
      curl_setopt($ch,CURLOPT_POSTFIELDS,http_build_query($postfields));
      $response = curl_exec($ch);
      curl_close($ch);
      echo $response;

但$ response的结果给了我这样的信息:

  

“Votrerequêten'estpas valide.Merci d'envoyerunerequêteHTTP   的Valide“。

     

“您的请求无效请发送有效的HTTP请求。”

任何人都可以提供帮助。提前致谢。

1 个答案:

答案 0 :(得分:0)

我不会宽恕你正在做的任何事情,因为它似乎违反了他们的政策 - 在这里找到他们的API:https://developer-tripadvisor.com/content-api/description/

话虽如此,如果您打开一些额外的解析...您可以尝试以下作为起点(提示:使用get按原样返回页面并手动编写规则来解析它)< / p>

    <?php

        $ch = curl_init();
          curl_setopt($ch,CURLOPT_URL,"https://www.tripadvisor.fr/Hotels-g293995-Riyadh_Riyadh_Province-Hotels.html");
          curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
          curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
          curl_setopt($ch,CURLOPT_GET,1);
          //curl_setopt($ch,CURLOPT_POSTFIELDS,http_build_query($postfields));
          $response = curl_exec($ch);
          curl_close($ch);
          echo "<pre><!--";//remove this(and below) and do whatever you want with the result page
          var_dump($response);
          echo "--></pre>";
    ?>