来自谷歌IP范围的DoS攻击

时间:2014-11-25 22:05:10

标签: google-app-engine googlebot ddos spoofing

我相信我已经受到了谷歌IP范围(66.249.65。* - 可能是ip欺骗?)的倍数请求(全天5秒/秒)的攻击。此请求在http标头上有googlebot签名(Googlebot / 2.1; + http://www.google.com/bot.html),但它尝试获取旧网址(我将其停用,因为它已经消耗了大量的cpu / $)。如果我把这个ip范围放在黑名单上,我也会阻止合法的googlebot :(。

具有讽刺意味的是:我的应用(http://expoonews.com)由谷歌应用引擎服务托管!

如果没有阻止google bot,我怎么能停止这种行为?

在我的日志示例下面,以便更好地理解。

 A 2014-11-25 19:41:19.145 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch
66.249.65.82 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:19.550 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca
66.249.65.86 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:19.956 404 234 B 12ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality
66.249.65.78 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=12 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:20.426 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:20.763 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:21.166 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory
66.249.65.86 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:21.571 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1
66.249.65.78 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16 

6 个答案:

答案 0 :(得分:1)

似乎Googlebot正在接收已经存储在您网站本身或其他攻击者的注入,这些攻击者已经在他的网站中对这些网址进行了硬编码,并且正在使用Googlebots发起攻击。

Web应用程序防火墙可以为您提供良好的解决方案,可以检测这些签名并明确拒绝此类请求

留意Google中的Apache-ModSecurity或Nginx NAXSI!

答案 1 :(得分:0)

您可以尝试使用robots.txt http://www.robotstxt.org/robotstxt.html

禁止该特定目录或页面

答案 2 :(得分:0)

  

应用程序根目录中的dos.yaml文件(旁边   app.yaml)为您的配置DoS保护服务黑名单   应用。以下是dos.yaml文件的示例:

blacklist:
- subnet: 1.2.3.4   description: a single IP address
- subnet: 1.2.3.4/24   description: an IPv4 subnet
- subnet: abcd::123:4567   description: an IPv6 address
- subnet: abcd::123:4567/48   description: an IPv6 subnet

https://cloud.google.com/appengine/docs/python/config/dos

答案 3 :(得分:0)

你应该写robots.txt至少阻止正版googlebot访问旧网址,他们会尝试频繁访问索引网址,直到网址返回404或任何其他方式被标记为已删除。

我不确定它是不是真的假机器人,因为googlebot本身就像垃圾邮件一样,在短时间内访问页面太多了。

要减少来自googlebot(假冒或正版)的访问权限,这样怎么样?

#allows access 100times/m
dos_n = memcache.get(key=bot_ip)
if dos_n != None:
    if dos_n>100:
        self.abort(400)
    dos_n = memcache.incr(bot_ip)
else:
    memcache.add(key= bot_ip, value=0, time=60)

并且只是为了获取信息,如果主机不在gae,您可以在网站管理员工具中更改抓取频率。 https://www.google.com/webmasters/tools/

答案 4 :(得分:0)

我认为我通过删除接收参数的url(url到另一个页面)解决了这个问题。

我认为机器人试图弄清楚哪个网址是开放的,以便伪造对特定网站的访问权限(也许是为了扩大金额访问权限)。我的网址被清楚地暴露出来(只是通过地址同时是一个GET)。

但感谢答案的人。

答案 5 :(得分:-1)

此可疑功能与您的网址上的googleBot网页抓取有关,如果您最近在网站上添加或更改了网页,则可以要求Google使用Google提取工具(重新)对其进行索引。