Question

我对cURL很新，并且只在短时间内使用它。我的问题是我希望通过使用cURL来获取页面的内容（file_get_content()不起作用）。不幸的是，该网站有机器人保护，这意味着它会在您第一次到达该网站时检查您是否是机器人。如果您不是机器人，它会将您重定向到具有绝对路径的真实站点（我猜）。每当我使用cURL加载此站点时，它会将路径附加到我的服务器地址。

例如：我的服务器的地址为：http://examplepage.com/ cURL将重定向的路径附加到我的URL。所以它会像：http://examplepage.com/absolute/path?with=parameters

在原始页面上，我尝试从中获取内容，因为它们有类似的路径，但我不这样做（我想要一些网站的html内容）。

到目前为止，这是我的代码：

    <?php

  /* getting site */
  $website = "https://originalsite.com/?some=parameters";
  $redirectURL;

  function curl_download($url) {
    //initialize curl handler
    $c = curl_init();

    // Include header in result? (0 = yes, 1 = no)
    curl_setopt($c, CURLOPT_HEADER, 1);

    //set url to download
    curl_setopt($c, CURLOPT_URL, $url);

    // follow redirection
    curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);

    //set referer
    curl_setopt($c, CURLOPT_REFERER, "https://originalsite.com/");

    // User agent
    curl_setopt($c, CURLOPT_USERAGENT, "MozillaXYZ/1.0");

    // Should cURL return or print out the data? (true = return, false = print)
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);

    // Timeout in seconds
    curl_setopt($c, CURLOPT_TIMEOUT, 10);

    // Download the given URL, and return output
    $output = curl_exec($c);

    // Close the cURL resource, and free system resources
    curl_close($c);

    return $output;
  }

  $content = curl_download($website);

  echo $content;

?>

所以它会进入检查我是否是机器人的网站，之后它会将我重定向到网站（或者至少，它会尝试）。

我搜索了互联网和StackOverflow，但我无法找到问题的答案。

Answer 1

发生的事情是，一旦您呈现页面，有一些JavaScript代码会发出重定向。尝试在浏览器中禁用JavaScript以进行快速测试。

curl将重定向的url放入浏览器的地址

1 个答案: