我试图通过用户代理禁止烦人的机器人。我把它放到nginx config的服务器部分:
server {
listen 80 default_server;
....
if ($http_user_agent ~* (AhrefsBot)) {
return 444;
}
通过curl检查:
[root@vm85559 site_avaliable]# curl -I -H 'User-agent: Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)' localhost/
curl: (52) Empty reply from server
所以我检查/var/log/nginx/access.log,我看到一些连接获得444,但另一个连接获得200!
51.255.65.78 - - [25/Jun/2017:15:47:36 +0300 - -] "GET /product/kovriki-avtomobilnie/volkswagen/?PAGEN_1=10 HTTP/1.1" 444 0 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" 1498394856.155
217.182.132.60 - - [25/Jun/2017:15:47:50 +0300 - 2.301] "GET /product/bryzgoviki/toyota/ HTTP/1.1" 200 14500 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" 1498394870.955
怎么可能?
答案 0 :(得分:0)
好的,明白了! 我将$ server_name和$ server_addr添加到nginx日志格式,并看到cunning bot通过ip连接而没有server_name:
51.255.65.40 - _ *myip* - [25/Jun/2017:16:22:27 +0300 - 2.449] "GET /product/soyuz_96_2/mitsubishi/l200/ HTTP/1.1" 200 9974 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" 1498396947.308
所以我添加了这个,机器人再也无法连接了
server {
listen *myip*:80;
server_name _;
return 403;
}