cUrl仍然没有完全模拟浏览器,所有https url调用都被服务器超时

时间:2017-11-27 20:42:43

标签: php curl browser timeout emulation

所以我把浏览器的整个标题放在我的代码中,我仍然会超时。差异可能是除了$ header数组之外的其他地方吗?

当我尝试时,额外的代码行给了我这样的响应:'注意:未定义的索引:request_header'

这是更新的样本:

$useragent = 'User-Agent: Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/57.0';

$header[] = 'Accept: text/html,application/xhtml+xm…plication/xml;q=0.9,*/*;q=0.8';
$header[] = 'Accept-Encoding: gzip, deflate, br';
$header[] = 'Accept-Language: en-US,en;q=0.5';
$header[] = 'Connection: keep-alive';
$header[] = 'Cookie: BUI=qkubv5up4ttnn38gyvkshqt8o5…11770704889&hd=1511770705073"';
$header[] = 'Host: www.bol.com';
$header[] = 'Upgrade-Insecure-Requests: 1';
$header[] = 'User-Agent: Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/57.0';
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($ch, CURLOPT_URL, "https://bol.com" );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true); 
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com'  );    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE,  true   );  // new
curl_setopt($ch, CURLOPT_ENCODING, '');  //''
curl_setopt($ch, CURLOPT_TIMEOUT, 20);


curl_setopt($ch, CURLINFO_HEADER_OUT, true);

$pagedata = curl_exec($ch);

$info = curl_getinfo($ch);  
var_dump($info['request_header']); // gives 'Notice: Undefined index: request_header'

。 。 ---->原始问题

所以我现在已经阅读了几天的cUrl答案,并将下面的代码拼凑在一起,我认为这样可以完全模拟浏览器调用。果然,它在我在家里运行时有效,但在工作中,我可能在一些安全墙后面,我从所有https调用得到连接超时(根本没有来自服务器的响应)。奇怪的是,每次在同一工作站的浏览器窗口中给出的完全相同的URL都可以完美地工作。

所以我的问题是,通过下面的代码,我的网络中的任何内容仍然可以检测到来自curl的调用是否源自我的真实浏览器?

$ useragent =' Mozilla / 5.0(Windows NT 5.1; rv:31.0)Gecko / 20100101 Firefox / 31.0';

$header[] = 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Cache-Control: max-age=0';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';
$header[] = ''; // 'Pragma: '; // browsers usually leave this blank. 
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSLVERSION, 3);

curl_setopt($ch, CURLOPT_URL, $url );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true); 
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com'  );    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE,  true   );  // new
curl_setopt($ch, CURLOPT_ENCODING, '');  //''
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$pagedata = curl_exec($ch);

如上所述,详细信息仅显示ip地址超时,没有错误,没有来自服务器的响应,无论如何。

0 个答案:

没有答案