使用HTTP标头和卷曲查找URL重定向?

时间:2011-12-16 12:16:20

标签: php redirect curl http-headers

我正在尝试编写重定向检查程序,以检查URL是否对搜索引擎友好。它必须检查URL是否被重定向,如果它被重定向,它必须告诉它是否是SEO友好(301状态代码)或不(302/304)。

以下是我发现的类似内容:http://www.webconfs.com/redirect-check.php

它也应该能够遵循多个重定向(例如从A到B到C)并告诉我A重定向到C.

这是我到目前为止所做的,但它不能正常工作(例如:在www.example.com上输入时,它没有找到重定向到www.example.com/page1)

<?php
// You can edit the messages of the respective code over here
$httpcode  = array();
$httpcode["200"] = "Ok";
$httpcode["201"] = "Created";
$httpcode["302"] = "Found";
$httpcode["301"] = "Moved Permanently";
$httpcode["304"] = "Not Modified";
$httpcode["400"] = "Bad Request";


if(count($_POST)>0)
{
    $url = $_POST["url"];
    $curlurl = "http://".$url."/";
    $ch = curl_init();
    // Set URL to download
    curl_setopt($ch, CURLOPT_URL, $curlurl);

    // User agent
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]);
    // Include header in result? (0 = yes, 1 = no)
    curl_setopt($ch, CURLOPT_HEADER, 0);

    // Should cURL return or print out the data? (true = return, false = print)
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    // Timeout in seconds
    curl_setopt($ch, CURLOPT_TIMEOUT, 15);

    // Download the given URL, and return output
    $output = curl_exec($ch);

    $curlinfo = curl_getinfo($ch);

    if(($curlinfo["http_code"]=="301") || ($curlinfo["http_code"]=="302"))
    {
        $ch = curl_init();
        // Set URL to download
        curl_setopt($ch, CURLOPT_URL, $curlurl);

        // User agent
        curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]);
        // Include header in result? (0 = yes, 1 = no)
        curl_setopt($ch, CURLOPT_HEADER, 0);

        // Should cURL return or print out the data? (true = return, false = print)
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

        // Timeout in seconds
        curl_setopt($ch, CURLOPT_TIMEOUT, 15);


        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        // Download the given URL, and return output
        $output = curl_exec($ch);

        $curlinfo = curl_getinfo($ch);
        echo $url." is redirected to ".$curlinfo["url"];
    }
    else
    {
        echo $url." is not getting redirected";
    }

    // Close the cURL resource, and free system resources
    curl_close($ch);
}
?>
<form action="" method="post">
http://<input type="text" name="url" size="30" />/ <b>e.g. www.google.com</b><br/>
<input type="submit" value="Submit" />
</form>

1 个答案:

答案 0 :(得分:7)

如果您想记录每个重定向,您必须自己实施并关闭自动“位置跟踪”:

function curl_trace_redirects($url, $timeout = 15) {

    $result = array();
    $ch = curl_init();

    $trace = true;
    $currentUrl = $url;

    $urlHist = array();
    while($trace && $timeout > 0 && !isset($urlHist[$currentUrl])) {
        $urlHist[$currentUrl] = true;

        curl_setopt($ch, CURLOPT_URL, $currentUrl);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);

        $output = curl_exec($ch);

        if($output === false) {
            $traceItem = array(
                'errorno' => curl_errno($ch),
                'error' => curl_error($ch),
            );

            $trace = false;
        } else {
            $curlinfo = curl_getinfo($ch);

            if(isset($curlinfo['total_time'])) {
                $timeout -= $curlinfo['total_time'];
            }

            if(!isset($curlinfo['redirect_url'])) {
                $curlinfo['redirect_url'] = get_redirect_url($output);
            }

            if(!empty($curlinfo['redirect_url'])) {
                $currentUrl = $curlinfo['redirect_url'];
            } else {
                $trace = false;
            }

            $traceItem = $curlinfo;
        }

        $result[] = $traceItem;
    }

    if($timeout < 0) {
        $result[] = array('timeout' => $timeout);
    }

    curl_close($ch);

    return $result;
}

// apparently 'redirect_url' is not available on all curl-versions
// so we fetch the location header ourselves
function get_redirect_url($header) {
    if(preg_match('/^Location:\s+(.*)$/mi', $header, $m)) {
        return trim($m[1]);
    }

    return "";
}

你就这样使用它:

$res = curl_trace_redirects("http://www.example.com");
foreach($res as $item) {
    if(isset($item['timeout'])) {
        echo "Timeout reached!\n";
    } else if(isset($item['error'])) {
        echo "error: ", $item['error'], "\n";
    } else {
        echo $item['url'];
        if(!empty($item['redirect_url'])) {
            // redirection
            echo " -> (", $item['http_code'], ")";
        }

        echo "\n";
    }
}

我的代码可能没有经过深思熟虑,但我想这是一个好的开始。

修改

以下是一些示例输出:

http://midas/~stefan/test/redirect/fritzli.html -> (302)
http://midas/~stefan/test/redirect/hansli.html -> (301)
http://midas/~stefan/test/redirect/heiri.html