尽管HTTP Code 200,PHP cURL返回空标题和正文

时间:2015-05-12 11:35:27

标签: php string http curl

所以我尝试废弃这个URL:xxxx.fr和cURL,但无法访问页面HTML代码,标题和正文都是空的。 HTTP代码返回为200 我尝试使用其他网址(不同的域名),它就像一个魅力。 我也尝试使用不同的User Agent和Referer

你知道什么是错的吗?至少可以有人在您自己的服务器上尝试此代码,如果您遇到同样的问题,请告诉我们吗?

谢谢

以下是我的代码:

  $url = 'http://www.xxxx.fr';

  $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
  $header[] = "Cache-Control: max-age=0";
  $header[] = "Connection: keep-alive";
  $header[] = "Keep-Alive: timeout=5, max=100";
  $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
  $header[] = "Accept-Language: en-us,en;q=0.5";
  $header[] = ""; // BROWSERS USUALLY LEAVE BLANK

  $curl = curl_init ();
  curl_setopt($curl, CURLOPT_URL, $url);
  curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
  curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0");
  curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
  curl_setopt($curl, CURLOPT_REFERER, "http://www.google.fr");
  curl_setopt($curl, CURLOPT_HEADER, 1);
  curl_setopt($curl, CURLINFO_HEADER_OUT, 1);
  curl_setopt($curl, CURLOPT_VERBOSE, 1);
  curl_setopt($curl, CURLOPT_COOKIEFILE, getcwd().'/cookies.txt');
  curl_setopt($curl, CURLOPT_COOKIEJAR, getcwd().'/cookies.txt');
  curl_setopt($curl, CURLOPT_TIMEOUT, 30);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  $curlData = curl_exec($curl);

  $infos = curl_getinfo($curl);
  print_r($infos);

  curl_close ( $curl );

  echo "<hr>Page:<br />";
  echo htmlentities($curlData);

这是print_r($ infos)的结果:

Array ( 
[url] => http://www.xxxx.fr 
[content_type] => text/html 
[http_code] => 200 
[header_size] => 625 
[request_size] => 465 
[filetime] => -1 
[ssl_verify_result] => 0
[redirect_count] => 0 
[total_time] => 0.032535 
[namelookup_time] => 0.001488 
[connect_time] => 0.002581 
[pretransfer_time] => 0.002639 
[size_upload] => 0 
[size_download] => 10234 
[speed_download] => 314553 
[speed_upload] => 0 
[download_content_length] => -1 
[upload_content_length] => 0 
[starttransfer_time] => 0.032088 
[redirect_time] => 0 
[certinfo] => Array ( ) 
[primary_ip] => xxx 
[primary_port] => 80 
[local_ip] => xxx 
[local_port] => 37319 
[redirect_url] => 
[request_header] => GET / HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Host: www.xxxx.fr Accept-Encoding: gzip,deflate Referer: http://www.google.fr Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Cache-Control: max-age=0 Connection: keep-alive Keep-Alive: timeout=5, max=100 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5 
) 

1 个答案:

答案 0 :(得分:3)

// EDIT

htmlentities($ curlData)返回空字符串,因为source的编码是非UTF-8字符串see this link

应该有效:

 htmlentities($curlData, ENT_QUOTES,'ISO-8859-1' );
  

在PHP 5.4版本中,htmlspecialchars()不使用ISO-8859-1作为默认编码。事实上,PHP 5.4中的htmlspecialchars()使用UTF-8。您可能期望,htmlspecialchars()只会跳过非UTF-8字节序列或将它们转换为“未找到”字符。事实上,htmlspecialchars()返回一个空字符串:没有生成错误,没有返回错误代码,也没有引发异常,如果传入无效的UTF-8序列,只返回一个空字符串