Question

我编写以下代码来获取来自url的html数据及其为Facebook等https网站工作但不适用于Instagram。 Instagram返回空白

(function() {
$('form input').keyup(function() {

    var empty = false;
    $('form  input').each(function() {
        if ($(this).val() == '') {
            empty = true;
        }
    });
    if (empty) {
        $('#register').attr('disabled', 'disabled');
    } else {
        $('#register').removeAttr('disabled');
    }
});
})()

Answer 1

Instagram将仅返回javascript，因为它使用动态路径，因此浏览器无法呈现，因此<script src='/path/file.js'>会尝试获取localhost/path/file.js而不是instagram.com/path/file.js在这种情况下，localhost/path/file.js将不存在，因此页面将空白。

一种解决方案是找到一种方法来提供完整的HTML 而不是 Javascript ，在这种情况下，您可以使用＆＃34; User-Agent＆＃34;做这个伎俩。 您可能知道JS无法通过搜索引擎处理，因此对于这种情况，Instagram（以及许多网站）在没有机器人支持的JS的情况下提供页面。

所以，添加：

curl_setopt($ch, CURLOPT_USERAGENT, "ABACHOBot");

＆＃34; ABACHOBot＆＃34;是一个爬行者。 In this page you can found many others alternatives，就像＆＃34; Baiduspider＆＃34;，＆＃34; BecomeBot＆＃34; ......

您可以使用＆＃34; generic＆＃34;用户代理，例如＆＃34; bot＆＃34;，＆＃34; spider＆＃34;，＆＃34; crawler＆＃34;也可能会奏效。

Answer 2

在此尝试

<?php 
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content);
/* gets the data from a URL */
function get_data($url) {
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  //Update.................
  curl_setopt($ch, CURLOPT_USERAGENT, 'spider');
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  curl_setopt($ch, CURLOPT_HEADER, false);
  //....................................................
  curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}
?>

你应该通过 curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false) 和上面的其他标题信息一样。有关详细信息，请参阅 http://stackoverflow.com/questions/4372710/php-curl-https

为什么Instagram返回空白到CURL请求？

2 个答案: