使用php检测bot

时间:2015-12-11 20:21:49

标签: php bots

我有一个名为index.php的页面,我有一个用于检测机器人的脚本,但它无法正常工作。如果机器人访问index.php,那么我想要包含welcome.php。如果它是原始用户,则不应包含welcome.php。 这是我到目前为止所尝试的:

   function is_bot(){
   $botlist = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi",
    "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory",
    "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot",
    "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp",
    "msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz",
    "Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot",
    "Mediapartners-Google", "Sogou web spider", "WebAlta 
     Crawler","TweetmemeBot", "Butterfly", "Twitturls", "Me.dium", 
     "Twiceler", "Purebot", "facebookexternalhit",
    "Yandex", "CatchBot", "W3C_Validator", "Jigsaw","PostRank", 
    "Purebot", "Twitterbot",
    "Voyager", "zelist", "pingdom", "favicon");

   foreach($botlist as $bot){
    if(strpos($_SERVER['HTTP_USER_AGENT'],$bot)!==false)
    return true;    // Is a bot
    }
   return false;    // Not a bot
    }

以下是我遇到的主要问题 - 以下内容无效:

  if (is_bot()==true) {
session_destroy(); include_once('welcome.php'); exit; }

接下来,我尝试了这个,但它也没有工作:

  if (is_bot()) {
  session_destroy(); include_once('welcome.php'); exit; }

请就此情况提出任何解决方案。

每当我像这样使用它就会起作用

 if (is_bot())
 $isbot = 1;
 else
 $isbot = 0;

3 个答案:

答案 0 :(得分:3)

最好改进你的is_bot函数并使用正则表达式而不是长时间的繁忙搜索。

下面的内容可能会更有用。

function is_bot(){
    preg_match('/bot|curl|spider|google|twitter^$/i', $_SERVER['HTTP_USER_AGENT'], $matches);

    return (empty($matches)) ? false : true;
}

答案 1 :(得分:1)

我很确定问题是代码 有效(虽然优化和格式化不当 - @ Imran的解决方案更清晰)但是你正在测试它。

您的 UA字符串不包含" bot" string - 你不是服务器。使用Google Chrome开发工具,就像这样;

  1. F12

  2. CTRL + SHIFT + M

  3. 顶部的UA框 并改变你的UA字符串以假装成其他人,例如" Googlebot的"然后测试一下。

  4. 只需访问一个网站并导航回您的网站,就不会模仿机器人的请求'从那个网站,它仍然只是你!

答案 2 :(得分:0)

以下代码是 100%在我的网站上工作

function isBotDetected() {

    if ( preg_match('/abacho|accona|AddThis|AdsBot|ahoy|AhrefsBot|AISearchBot|alexa|altavista|anthill|appie|applebot|arale|araneo|AraybOt|ariadne|arks|aspseek|ATN_Worldwide|Atomz|baiduspider|baidu|bbot|bingbot|bing|Bjaaland|BlackWidow|BotLink|bot|boxseabot|bspider|calif|CCBot|ChinaClaw|christcrawler|CMC\/0\.01|combine|confuzzledbot|contaxe|CoolBot|cosmos|crawler|crawlpaper|crawl|curl|cusco|cyberspyder|cydralspider|dataprovider|digger|DIIbot|DotBot|downloadexpress|DragonBot|DuckDuckBot|dwcp|EasouSpider|ebiness|ecollector|elfinbot|esculapio|ESI|esther|eStyle|Ezooms|facebookexternalhit|facebook|facebot|fastcrawler|FatBot|FDSE|FELIX IDE|fetch|fido|find|Firefly|fouineur|Freecrawl|froogle|gammaSpider|gazz|gcreep|geona|Getterrobo-Plus|get|girafabot|golem|googlebot|\-google|grabber|GrabNet|griffon|Gromit|gulliver|gulper|hambot|havIndex|hotwired|htdig|HTTrack|ia_archiver|iajabot|IDBot|Informant|InfoSeek|InfoSpiders|INGRID\/0\.1|inktomi|inspectorwww|Internet Cruiser Robot|irobot|Iron33|JBot|jcrawler|Jeeves|jobo|KDD\-Explorer|KIT\-Fireball|ko_yappo_robot|label\-grabber|larbin|legs|libwww-perl|linkedin|Linkidator|linkwalker|Lockon|logo_gif_crawler|Lycos|m2e|majesticsEO|marvin|mattie|mediafox|mediapartners|MerzScope|MindCrawler|MJ12bot|mod_pagespeed|moget|Motor|msnbot|muncher|muninn|MuscatFerret|MwdSearch|NationalDirectory|naverbot|NEC\-MeshExplorer|NetcraftSurveyAgent|NetScoop|NetSeer|newscan\-online|nil|none|Nutch|ObjectsSearch|Occam|openstat.ru\/Bot|packrat|pageboy|ParaSite|patric|pegasus|perlcrawler|phpdig|piltdownman|Pimptrain|pingdom|pinterest|pjspider|PlumtreeWebAccessor|PortalBSpider|psbot|rambler|Raven|RHCS|RixBot|roadrunner|Robbie|robi|RoboCrawl|robofox|Scooter|Scrubby|Search\-AU|searchprocess|search|SemrushBot|Senrigan|seznambot|Shagseeker|sharp\-info\-agent|sift|SimBot|Site Valet|SiteSucker|skymob|SLCrawler\/2\.0|slurp|snooper|solbot|speedy|spider_monkey|SpiderBot\/1\.0|spiderline|spider|suke|tach_bw|TechBOT|TechnoratiSnoop|templeton|teoma|titin|topiclink|twitterbot|twitter|UdmSearch|Ukonline|UnwindFetchor|URL_Spider_SQL|urlck|urlresolver|Valkyrie libwww\-perl|verticrawl|Victoria|void\-bot|Voyager|VWbot_K|wapspider|WebBandit\/1\.0|webcatcher|WebCopier|WebFindBot|WebLeacher|WebMechanic|WebMoose|webquest|webreaper|webspider|webs|WebWalker|WebZip|wget|whowhere|winona|wlm|WOLP|woriobot|WWWC|XGET|xing|yahoo|YandexBot|YandexMobileBot|yandex|yeti|Zeus/i', $_SERVER['HTTP_USER_AGENT'])
    ) {
        return true; // 'Above given bots detected'
    }

    return false;

} // End :: isBotDetected()