我有一个网站状态检查器,将检查的最新网址写入日志文件(网址,状态,例如上或下和检查日期),麻烦我现在发现它还记录了蜘蛛/谷歌机器人访问,所以最新的网站检查每秒写多次......
这是我的日志写入功能:
public function log($url, $status) {
if (strpos($url, "/") !== false):
if (strpos($url, "http://") === false):
$url = "http://" . $url;
endif;
$parse = parse_url($url);
$url = $parse['host'];
endif;
if (!empty($url)):
$arrayToWrite = array(
array(
"url" => $url,
"status" => $status,
"date" => date("m/d/Y h:i")
)
);
if (file_exists($this->logfile)):
$fileContents = file_get_contents($this->logfile);
$arrayFromFile = unserialize($fileContents);
foreach ($arrayFromFile as $k => $tmpArray):
if ($tmpArray['url'] == $url):
unset($arrayFromFile[$k]);
endif;
endforeach;
if (is_array($arrayFromFile)):
array_splice($arrayFromFile, 9);
$arrayToWrite = array_merge($arrayToWrite, $arrayFromFile);
endif;
endif;
file_put_contents($this->logfile, serialize($arrayToWrite));
endif;
}
我可以进行哪些类型的修改,因此它会忽略机器人/蜘蛛访问,因此它只跟踪/写入真正的访问者?
答案 0 :(得分:0)
重申这个答案:how to detect search engine bots with php?
您可以使用$_SERVER['HTTP_USER_AGENT']
检查访问者是否识别为蜘蛛。
$bots = array("googlebot", "msn", "add other bots");
if(in_array(strtolower($_SERVER['HTTP_USER_AGENT']), $bots)){
// Don't save url
}