Alexa机器人被重定向,即使它处于良好的机器人列表中

时间:2019-11-14 12:10:07

标签: php bots

我正在尝试将错误的漫游器重定向到错误页面。但是我在cPanel中注意到该脚本正在重定向alexa bot Mozilla/5.0 (compatible; ia_archiver/1.0; +http://www.alexa.com/help/webmasters; crawler@alexa.com),即使我将其列为优秀的bot。有人可以告诉我为什么吗?我尝试将crawler@alexa.com和ia_archiver / 1.0添加到列表中,但这没有帮助。

// ---------------------------------------------------------------------------------------------------------------
// Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL.
// The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123.
// Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period.
// The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string.
// The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-).
// The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as
// $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned.
$banned_ip_addresses = array('');
$banned_bots = array('.ru', 'AhrefsBot', 'crawl', 'crawler', 'DotBot', 'linkdex', 'majestic', 'meanpath', 'PageAnalyzer', 'robot', 'rogerbot', 'semalt', 'SeznamBot', 'spider','MJ12Bot','SEMrushBot','MauiBot','Acunetix','FHscan','Baiduspider');
$banned_unknown_bots = array('bot ', 'bot_', 'bot+', 'bot:', 'bot,', 'bot;', 'bot\\', 'bot.', 'bot/', 'bot-');
$good_bots = array('Google', 'Googlebot', 'MSN', 'bing', 'bingbot', 'Slurp', 'Yahoo', 'DuckDuck', 'STBot', 'Mediapartners-Google', 'ia_archiver', 'LinkedInBot','YandexBot');
$banned_redirect_url = 'htt ps:/ /mysite.com/?view=page&pagename=welcome';
// Visitor's IP address and Browser (User Agent)
$ip_address = $_SERVER['REMOTE_ADDR'];
$browser = $_SERVER['HTTP_USER_AGENT'];
// Declared Temporary Variables
$ipfound = $piece = $botfound = $gbotfound = $ubotfound = '';
// Checks for Banned IP Addresses and Bots
if ($banned_redirect_url != '') {
    // Checks for Banned IP Address
    if (!empty($banned_ip_addresses)) {
        if (in_array($ip_address, $banned_ip_addresses)) {
            $ipfound = 'found';
        }
        if ($ipfound != 'found') {
            $ip_pieces = explode('.', $ip_address);
            foreach ($ip_pieces as $value) {
                $piece = $piece . $value . '.';
                if (in_array($piece, $banned_ip_addresses)) {
                    $ipfound = 'found';
                    break;
                }
            }
        }
        if ($ipfound == 'found') {
            header("location: $banned_redirect_url");
            exit();
        }
    }
    // Checks for Banned Bots
    if (!empty($banned_bots)) {
        foreach ($banned_bots as $bbvalue) {
            $pos1 = stripos($browser, $bbvalue);
            if ($pos1 !== false) {
                $botfound = 'found';
                break;
            }
        }
        if ($botfound == 'found') {
            header("location: $banned_redirect_url");
            exit();
        }
    }
    // Checks for Banned Unknown Bots
    if (!empty($good_bots)) {
        foreach ($good_bots as $gbvalue) {
            $pos2 = stripos($browser, $gbvalue);
            if ($pos2 !== false) {
                $gbotfound = 'found';
                break;
            }
        }
    }
    if ($gbotfound != 'found') {
        if (!empty($banned_unknown_bots)) {
            foreach ($banned_unknown_bots as $bubvalue) {
                $pos3 = stripos($browser, $bubvalue);
                if ($pos3 !== false) {
                    $ubotfound = 'found';
                    break;
                }
            }
            if ($ubotfound == 'found') {
                header("location: $banned_redirect_url");
                exit();
            }
        }
    }
}
// ---------------------------------------------------------------------------------------------------------------

以某种方式,SO认为我的帖子中没有足够的文字。不要让我提交。因此,如果代码太长,它会认为我们应该只是键入任何内容以添加文本?不知道这是否是个好主意。有时代码很长。我们该怎么办?

0 个答案:

没有答案