我想通过打开套接字连接并发送原始HTTP请求来发出HTTP请求而不依赖于cURL和allow_url_fopen = 1
:
/**
* Make HTTP GET request
*
* @param string the URL
* @param int will be filled with HTTP response status code
* @param string will be filled with HTTP response header
* @return string HTTP response body
*/
function http_get_request($url, &$http_code = '', &$res_head = '')
{
$scheme = $host = $user = $pass = $query = $fragment = '';
$path = '/';
$port = substr($url, 0, 5) == 'https' ? 443 : 80;
extract(parse_url($url));
$path .= ($query ? "?$query" : '').($fragment ? "#$fragment" : '');
$head = "GET $path HTTP/1.1\r\n"
. "Host: $host\r\n"
. "Authorization: Basic ".base64_encode("$user:$pass")."\r\n"
. "Connection: close\r\n\r\n";
$fp = fsockopen($scheme == 'https' ? "ssl://$host" : $host, $port) or
die('Cannot connect!');
fputs($fp, $head);
while(!feof($fp)) {
$res .= fgets($fp, 4096);
}
fclose($fp);
list($res_head, $res_body) = explode("\r\n\r\n", $res, 2);
list(, $http_code, ) = explode(' ', $res_head, 3);
return $res_body;
}
该函数运行正常,但由于我使用的是HTTP / 1.1,因此响应主体通常以Chunked-encoded字符串返回。例如(来自维基百科):
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
我不想使用http_chunked_decode()
,因为它具有PECL依赖性,我想要一个高度可移植的代码。
如何轻松解码HTTP-chunked编码字符串,以便我的函数可以返回原始HTML?我还必须确保解码字符串的长度与Content-Length:
标头匹配。
任何帮助将不胜感激。感谢。
答案 0 :(得分:10)
由于函数返回HTTP响应头,您应检查'Transfer-Encoding'
是否为'chunked'
,然后解码分块编码的字符串。
在伪代码中:
CALL parse_http_header
IF 'Transfer-Encoding' IS 'chunked'
CALL decode_chunked
解析HTTP响应标头:
下面是将HTTP响应头解析为关联数组的函数。
function parse_http_header($str)
{
$lines = explode("\r\n", $str);
$head = array(array_shift($lines));
foreach ($lines as $line) {
list($key, $val) = explode(':', $line, 2);
if ($key == 'Set-Cookie') {
$head['Set-Cookie'][] = trim($val);
} else {
$head[$key] = trim($val);
}
}
return $head;
}
该函数将返回如下数组:
Array
(
[0] => HTTP/1.1 200 OK
[Expires] => Tue, 31 Mar 1981 05:00:00 GMT
[Content-Type] => text/html; charset=utf-8
[Transfer-Encoding] => chunked
[Set-Cookie] => Array
(
[0] => k=10.34; path=/; expires=Sat, 09-Jun-12 01:58:23 GMT; domain=.example.com
[1] => guest_id=v1%3A13; domain=.example.com; path=/; expires=Mon, 02-Jun-2014 13:58:23 GMT
)
[Content-Length] => 43560
)
注意Set-Cookie
标头如何解析为数组。您需要稍后解析cookie以将URL与需要发送的cookie相关联。
解码分块编码的字符串
下面的函数将chunked编码的字符串作为参数,然后返回 解码后的字符串。
function decode_chunked($str) {
for ($res = ''; !empty($str); $str = trim($str)) {
$pos = strpos($str, "\r\n");
$len = hexdec(substr($str, 0, $pos));
$res.= substr($str, $pos + 2, $len);
$str = substr($str, $pos + 2 + $len);
}
return $res;
}
// Given the string in the question, the function above will returns:
//
// This is the data in the first chunk
// and this is the second one
// consequence
答案 1 :(得分:2)
我不知道你最不需要做什么,但是如果你指定HTTP/1.0
而不是HTTP/1.1
,你就不会得到一个分块响应。
答案 2 :(得分:1)
此功能在Wordpress中使用。
function decode_chunked($data) {
if (!preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', trim($data))) {
return $data;
}
$decoded = '';
$encoded = $data;
while (true) {
$is_chunked = (bool) preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', $encoded, $matches);
if (!$is_chunked) {
// Looks like it's not chunked after all
return $data;
}
$length = hexdec(trim($matches[1]));
if ($length === 0) {
// Ignore trailer headers
return $decoded;
}
$chunk_length = strlen($matches[0]);
$decoded .= substr($encoded, $chunk_length, $length);
$encoded = substr($encoded, $chunk_length + $length + 2);
if (trim($encoded) === '0' || empty($encoded)) {
return $decoded;
}
}
// We'll never actually get down here
// @codeCoverageIgnoreStart
}