过去几天我一直试图从网站上获取请求,但没有成功。 我一直收到错误301。 是否有人能够帮助我抓住此页面的内容:https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET
我期待着你的回复。
编辑: 这是我用过的php函数:
function http_request(
$verb = 'GET', /* HTTP Request Method (GET and POST supported) */
$ip, /* Target IP/Hostname */
$port = 80, /* Target TCP port */
$uri = '/', /* Target URI */
$getdata = array(), /* HTTP GET Data ie. array('var1' => 'val1', 'var2' => 'val2') */
$postdata = array(), /* HTTP POST Data ie. array('var1' => 'val1', 'var2' => 'val2') */
$cookie = array(), /* HTTP Cookie Data ie. array('var1' => 'val1', 'var2' => 'val2') */
$custom_headers = array(), /* Custom HTTP headers ie. array('Referer: http://localhost/ */
$timeout = 1000, /* Socket timeout in milliseconds */
$req_hdr = false, /* Include HTTP request headers */
$res_hdr = false /* Include HTTP response headers */
)
{
$ret = '';
$verb = strtoupper($verb);
$cookie_str = '';
$getdata_str = count($getdata) ? '?' : '';
$postdata_str = '';
foreach ($getdata as $k => $v)
$getdata_str .= urlencode($k) .'='. urlencode($v);
foreach ($postdata as $k => $v)
$postdata_str .= urlencode($k) .'='. urlencode($v) .'&';
foreach ($cookie as $k => $v)
$cookie_str .= urlencode($k) .'='. urlencode($v) .'; ';
$crlf = "\r\n";
$req = $verb .' '. $uri . $getdata_str .' HTTP/1.1' . $crlf;
$req .= 'Host: '. $ip . $crlf;
$req .= 'User-Agent: Mozilla/5.0 Firefox/3.6.12' . $crlf;
$req .= 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' . $crlf;
$req .= 'Accept-Language: en-us,en;q=0.5' . $crlf;
$req .= 'Accept-Encoding: deflate' . $crlf;
$req .= 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' . $crlf;
foreach ($custom_headers as $k => $v)
$req .= $k .': '. $v . $crlf;
if (!empty($cookie_str))
$req .= 'Cookie: '. substr($cookie_str, 0, -2) . $crlf;
if ($verb == 'POST' && !empty($postdata_str)){
$postdata_str = substr($postdata_str, 0, -1);
$req .= 'Content-Type: application/x-www-form-urlencoded' . $crlf;
$req .= 'Content-Length: '. strlen($postdata_str) . $crlf . $crlf;
$req .= $postdata_str;
}
else $req .= $crlf;
if ($req_hdr)
$ret .= $req;
if (($fp = @fsockopen($ip, $port, $errno, $errstr)) == false)
return "Error $errno: $errstr\n";
stream_set_timeout($fp, 0, $timeout * 1000);
fputs($fp, $req);
while ($line = fgets($fp)) $ret .= $line;
fclose($fp);
if (!$res_hdr)
$ret = substr($ret, strpos($ret, "\r\n\r\n") + 4);
return $ret;
}
答案 0 :(得分:2)
首先,301 is not an "error" as such,表示您正在被重定向。您需要解析响应头,获取Location:
标头的值(HTTP协议规范要求在重定向响应中出现)并请求该URI。
其次,上述功能似乎不提供对访问HTTPS URL的任何支持。您需要为PHP实例安装OpenSSL扩展来执行此操作,并且您还需要实际调用它。您可以通过在ssl://
参数中的地址前面传递tls://
或$ip
来使用上述功能,但您不能简单地传递IP。
第三,执行此类操作的常用方法是使用cURL扩展名。你会做这样的事情:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET'); // Set the URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Get the result from the execution
if (($result = curl_exec($ch)) === FALSE) { // Execute the request
echo "cURL failed! Error: ".curl_error($ch);
} else {
echo "Success! Result: $result";
}
curl_close($ch);
或者,如果cURL不可用或您不想出于某种原因使用它,您可以使用my HTTPRequest class,这符合PHP4并且不需要扩展(除了用于HTTPS请求的OpenSSL) 。在脚本顶部的注释中记录(ish)。你会做这样的事情:
$request = new httprequest(); // Create an object
// Set the request URL
if (!$request->setRequestURL('https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET')) echo "Failed! Error: ".$request->getLastErrorStr()."<br>\r\n";
// Send the request
if (!$request->sendRequest()) echo "Failed! Error: ".$request->getLastErrorStr()."<br>\r\n";
echo "Success! Result: ".$request->getResponseBodyData(TRUE);
另外,很多Scene PreDB经理/提供商都不太热衷于自动抓取,你可能会被禁止......