如果访客是蜘蛛,防止写日志/文件?

时间:2017-03-01 17:03:34

标签: php

我有一个网站状态检查器,将检查的最新网址写入日志文件(网址,状态,例如上或下和检查日期),麻烦我现在发现它还记录了蜘蛛/谷歌机器人访问,所以最新的网站检查每秒写多次......

这是我的日志写入功能:

public function log($url, $status) {
    if (strpos($url, "/") !== false):
        if (strpos($url, "http://") === false):
            $url = "http://" . $url;
        endif;
        $parse = parse_url($url);
        $url = $parse['host'];
    endif;
    if (!empty($url)):
        $arrayToWrite = array(
            array(
                "url" => $url,
                "status" => $status,
                "date" => date("m/d/Y h:i")
            )
        );
        if (file_exists($this->logfile)):
            $fileContents = file_get_contents($this->logfile);
            $arrayFromFile = unserialize($fileContents);
            foreach ($arrayFromFile as $k => $tmpArray):
                if ($tmpArray['url'] == $url):
                    unset($arrayFromFile[$k]);
                endif;
            endforeach;
            if (is_array($arrayFromFile)):
                array_splice($arrayFromFile, 9);
                $arrayToWrite = array_merge($arrayToWrite, $arrayFromFile);
            endif;
        endif;
        file_put_contents($this->logfile, serialize($arrayToWrite));
    endif;
}

我可以进行哪些类型的修改,因此它会忽略机器人/蜘蛛访问,因此它只跟踪/写入真正的访问者?

1 个答案:

答案 0 :(得分:0)

重申这个答案:how to detect search engine bots with php?

您可以使用$_SERVER['HTTP_USER_AGENT']检查访问者是否识别为蜘蛛。

$bots = array("googlebot", "msn", "add other bots");
if(in_array(strtolower($_SERVER['HTTP_USER_AGENT']), $bots)){
     // Don't save url
}

A List of Spiders