HTTPS链接获取问题

时间:2011-12-04 22:23:38

标签: php https request http-status-code-301

过去几天我一直试图从网站上获取请求,但没有成功。 我一直收到错误301。 是否有人能够帮助我抓住此页面的内容:https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET

我期待着你的回复。

编辑: 这是我用过的php函数:

function http_request(
    $verb = 'GET',             /* HTTP Request Method (GET and POST supported) */
    $ip,                       /* Target IP/Hostname */
    $port = 80,                /* Target TCP port */
    $uri = '/',                /* Target URI */
    $getdata = array(),        /* HTTP GET Data ie. array('var1' => 'val1', 'var2' => 'val2') */
    $postdata = array(),       /* HTTP POST Data ie. array('var1' => 'val1', 'var2' => 'val2') */
    $cookie = array(),         /* HTTP Cookie Data ie. array('var1' => 'val1', 'var2' => 'val2') */
    $custom_headers = array(), /* Custom HTTP headers ie. array('Referer: http://localhost/ */
    $timeout = 1000,           /* Socket timeout in milliseconds */
    $req_hdr = false,          /* Include HTTP request headers */
    $res_hdr = false           /* Include HTTP response headers */
    )
{
    $ret = '';
    $verb = strtoupper($verb);
    $cookie_str = '';
    $getdata_str = count($getdata) ? '?' : '';
    $postdata_str = '';
    foreach ($getdata as $k => $v)
        $getdata_str .= urlencode($k) .'='. urlencode($v);
    foreach ($postdata as $k => $v)
        $postdata_str .= urlencode($k) .'='. urlencode($v) .'&';
    foreach ($cookie as $k => $v)
        $cookie_str .= urlencode($k) .'='. urlencode($v) .'; ';
    $crlf = "\r\n";
    $req = $verb .' '. $uri . $getdata_str .' HTTP/1.1' . $crlf;
    $req .= 'Host: '. $ip . $crlf;
    $req .= 'User-Agent: Mozilla/5.0 Firefox/3.6.12' . $crlf;
    $req .= 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' . $crlf;
    $req .= 'Accept-Language: en-us,en;q=0.5' . $crlf;
    $req .= 'Accept-Encoding: deflate' . $crlf;
    $req .= 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' . $crlf;
    foreach ($custom_headers as $k => $v)
        $req .= $k .': '. $v . $crlf;
    if (!empty($cookie_str))
        $req .= 'Cookie: '. substr($cookie_str, 0, -2) . $crlf;
    if ($verb == 'POST' && !empty($postdata_str)){
        $postdata_str = substr($postdata_str, 0, -1);
        $req .= 'Content-Type: application/x-www-form-urlencoded' . $crlf;
        $req .= 'Content-Length: '. strlen($postdata_str) . $crlf . $crlf;
        $req .= $postdata_str;
    }   
    else $req .= $crlf;
    if ($req_hdr)
        $ret .= $req;
    if (($fp = @fsockopen($ip, $port, $errno, $errstr)) == false)
        return "Error $errno: $errstr\n";
    stream_set_timeout($fp, 0, $timeout * 1000);
    fputs($fp, $req);
    while ($line = fgets($fp)) $ret .= $line;
    fclose($fp);
    if (!$res_hdr)
        $ret = substr($ret, strpos($ret, "\r\n\r\n") + 4);
    return $ret;
}

1 个答案:

答案 0 :(得分:2)

首先,301 is not an "error" as such,表示您正在被重定向。您需要解析响应头,获取Location:标头的值(HTTP协议规范要求在重定向响应中出现)并请求该URI。

其次,上述功能似乎不提供对访问HTTPS URL的任何支持。您需要为PHP实例安装OpenSSL扩展来执行此操作,并且您还需要实际调用它。您可以通过在ssl://参数中的地址前面传递tls://$ip来使用上述功能,但您不能简单地传递IP。

第三,执行此类操作的常用方法是使用cURL扩展名。你会做这样的事情:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET'); // Set the URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Get the result from the execution

if (($result = curl_exec($ch)) === FALSE) { // Execute the request
  echo "cURL failed! Error: ".curl_error($ch);
} else {
  echo "Success! Result: $result";
}

curl_close($ch);

或者,如果cURL不可用或您不想出于某种原因使用它,您可以使用my HTTPRequest class,这符合PHP4并且不需要扩展(除了用于HTTPS请求的OpenSSL) 。在脚本顶部的注释中记录(ish)。你会做这样的事情:

$request = new httprequest(); // Create an object

// Set the request URL
if (!$request->setRequestURL('https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET')) echo "Failed! Error: ".$request->getLastErrorStr()."<br>\r\n";
// Send the request
if (!$request->sendRequest()) echo "Failed! Error: ".$request->getLastErrorStr()."<br>\r\n";

echo "Success! Result: ".$request->getResponseBodyData(TRUE);

另外,很多Scene PreDB经理/提供商都不太热衷于自动抓取,你可能会被禁止......