我想做的是了解重定向后的最后/最终网址。
我不想使用cURL。我想坚持使用纯PHP(流包装器)。
现在我有一个URL(比方说http://domain.test),我使用get_headers()来获取该页面的特定标题。 get_headers还会返回多个Location:
标头(请参阅下面的 编辑 )。有没有办法使用这些标头来构建最终的URL?或者是否有自动执行此操作的PHP函数?
编辑: get_headers()跟随重定向并返回每个响应/重定向的所有标头,因此我拥有所有Location:
标头。
答案 0 :(得分:40)
function getRedirectUrl ($url) {
stream_context_set_default(array(
'http' => array(
'method' => 'HEAD'
)
));
$headers = get_headers($url, 1);
if ($headers !== false && isset($headers['Location'])) {
return $headers['Location'];
}
return false;
}
<强> 另外... 强>
正如评论中所提到的,$headers['Location']
中的 final 项目将是所有重定向后的最终网址。但重要的是要注意,它不会始终是一个数组。有时它只是一个普通的非数组变量。在这种情况下,尝试访问最后一个数组元素很可能会返回一个字符。不理想。
如果您只对最终的网址感兴趣,那么在所有重定向后,我建议您更改
return $headers['Location'];
到
return is_array($headers['Location']) ? array_pop($headers['Location']) : $headers['Location'];
...
只是if short-handif(is_array($headers['Location'])){
return array_pop($headers['Location']);
}else{
return $headers['Location'];
}
此修复程序将处理这两种情况(数组,非数组),并且在调用函数后不需要清除最终的URL。
如果没有重定向,该函数将返回false
。同样,该函数也会为无效的URL返回false
(由于任何原因无效)。因此,在运行此函数之前,check the URL for validity 非常重要,否则请将重定向检查合并到您的验证中。
答案 1 :(得分:29)
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* @param string $url
* @return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = @parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* @param string $url
* @return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}
而且,和往常一样,给予信任:
http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
答案 2 :(得分:3)
虽然OP希望避免cURL
,但最好在可用时使用它。这是一个具有以下优点的解决方案
location
标题名称的服务器(xaav和webjay的答案都不会处理此问题)这是功能:
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
$url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close ($ch);
return $url;
}
这是一个更详细的版本,它允许您检查重定向链,而不是让curl跟随它。
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
while ($maxRequests--) {
//fetch
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
//try to determine redirection url
$location = '';
if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) {
if (preg_match('/Location:(.*)/i', $response, $match)) {
$location = trim($match[1]);
}
}
if (empty($location)) {
//we've reached the end of the chain...
return $url;
}
//build next url
if ($location[0] == '/') {
$u = parse_url($url);
$url = $u['scheme'] . '://' . $u['host'];
if (isset($u['port'])) {
$url .= ':' . $u['port'];
}
$url .= $location;
} else {
$url = $location;
}
}
return null;
}
作为此功能处理的重定向链的示例,但其他功能没有,请尝试:
echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005')
在撰写本文时,这涉及4个请求,其中包含Location
和location
个标题。
答案 3 :(得分:2)
xaav答案非常好;除了以下两个问题:
某些网站无法使用,因为它们无法识别基础用户代理(客户端浏览器) =&GT;只需添加一个User-agent标头字段即可解决此问题:我添加了一个Android用户代理(您可以根据需要在此处找到http://www.useragentstring.com/pages/useragentstring.php其他用户代理示例):
$ request。=&#34; User-Agent:Mozilla / 5.0(Linux; U; Android 4.0.3; ko-kr; LG-L160L Build / IML74K)AppleWebkit / 534.30(KHTML,类似Gecko)版本/ 4.0 Mobile Safari / 534.30 \ r \ n&#34;;
以下是修改后的答案:
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* @param string $url
* @return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = @parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* @param string $url
* @return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
答案 4 :(得分:0)
已从答案@xaav和@Houssem BDIOUI添加到代码中:404错误情况以及URL无响应时的情况。在这种情况下,get_final_url($url)
返回字符串:“错误:找不到404”和“错误:没有响应”。
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect,
* or 'Error: No Responce',
* or 'Error: 404 Not Found'
*
* @param string $url
* @return string
*/
function get_redirect_url($url)
{
$redirect_url = null;
$url_parts = @parse_url($url);
if (!$url_parts)
return false;
if (!isset($url_parts['host']))
return false; //can't process relative URLs
if (!isset($url_parts['path']))
$url_parts['path'] = '/';
$sock = @fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return 'Error: No Responce';
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?' . $url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while (!feof($sock))
$response .= fread($sock, 8192);
fclose($sock);
if (stripos($response, '404 Not Found') !== false)
{
return 'Error: 404 Not Found';
}
if (preg_match('/^Location: (.+?)$/m', $response, $matches))
{
if (substr($matches[1], 0, 1) == "/")
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else
{
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url)
{
$redirects = array();
while ($newurl = get_redirect_url($url))
{
if (in_array($newurl, $redirects))
{
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect,
* or 'Error: No Responce'
* or 'Error: 404 Not Found',
*
* @param string $url
* @return string
*/
function get_final_url($url)
{
$redirects = get_all_redirects($url);
if (count($redirects) > 0)
{
return array_pop($redirects);
} else
{
return $url;
}
}
答案 5 :(得分:0)
经过数小时阅读 Stackoverflow 并尝试了人们编写的所有自定义函数以及尝试了所有 cURL 建议后,我只做了 1 次重定向,我设法实现了自己的逻辑。
$url = 'facebook.com';
// First let's find out if we just typed the domain name alone or we prepended with a protocol
if (preg_match('/(http|https):\/\/[a-z0-9]+[a-z0-9_\/]*/',$url)) {
$url = $url;
} else {
$url = 'http://' . $url;
echo '<p>No protocol given, defaulting to http://';
}
// Let's print out the initial URL
echo '<p>Initial URL: ' . $url . '</p>';
// Prepare the HEAD method when we send the request
stream_context_set_default(array('http' => array('method' => 'HEAD')));
// Probe for headers
$headers = get_headers($url, 1);
// If there is a Location header, trigger logic
if (isset($headers['Location'])) {
// If there is more than 1 redirect, Location will be array
if (is_array($headers['Location'])) {
// If that's the case, we are interested in the last element of the array (thus the last Location)
echo '<p>Redirected URL: ' . $headers['Location'][array_key_last($headers['Location'])] . '</p>';
$url = $headers['Location'][array_key_last($headers['Location'])];
} else {
// If it's not an array, it means there is only 1 redirect
//var_dump($headers['Location']);
echo '<p>Redirected URL: ' . $headers['Location'] . '</p>';
$url = $headers['Location'];
}
} else {
echo '<p>URL: ' . $url . '</p>';
}
// You can now send get_headers to the latest location
$headers = get_headers($url, 1);