Question

我有几页网页和网址，我不想被Google抓取工具抓取。

我知道可以通过robots.txt完成。我搜索谷歌并发现这种方式我们需要在robots.txt中安排所有内容以禁止抓取，但我不确定它是否正确。

User-Agent: *
Disallow: /music?
Disallow: /widgets/radio?

Disallow: /affiliate/
Disallow: /affiliate_redirect.php
Disallow: /affiliate_sendto.php
Disallow: /affiliatelink.php
Disallow: /campaignlink.php
Disallow: /delivery.php

Disallow: /music/+noredirect/
Disallow: /user/*/library/music/
Disallow: /*/+news/*/visit
Disallow: /*/+wiki/diff

# AJAX content
Disallow: /search/autocomplete
Disallow: /template
Disallow: /ajax
Disallow: /user/*/tasteomatic

我可以这样给出网址吗？我的意思是，我可以指定完整的URL为禁用吗？

Disallow: http://www.bba-reman.com/admin/feedback.htm

修改

我当前的robots.txt条目如下所示

User-Agent: *
Disallow: /CheckLogin
Disallow: /DTC.pdf
Disallow: /catalogue/bmw.htm
Disallow: /auto-mine/bmw/index.htm
Disallow: /forums/parent.Jmp('i100')
Disallow: /forums/parent.Jmp('i040')
Disallow: /forums/CodeDescriptions.html
Disallow: /forums/parent.Jmp('i050')
Disallow: /forums/parent.Scl('000','24601')
Disallow: /forums/parent.Jmp('i030')
Disallow: /catalogue/peugeot.htm

没关系.....告诉我。感谢

Answer 1

Disallow字段的值始终是网址path 的开头。

因此，如果您的robots.txt可以从http://example.com/robots.txt访问，并且它包含此行

Disallow: http://example.com/admin/feedback.htm

然后不允许这样的网址：

http://example.com/http://example.com/admin/feedback.htm

http://example.com/http://example.com/admin/feedback.html

http://example.com/http://example.com/admin/feedback.htm_foo

http://example.com/http://example.com/admin/feedback.htm/bar

...

因此，如果您想禁止使用网址http://example.com/admin/feedback.htm，则必须使用

Disallow: /admin/feedback.htm

会阻止这些网址：

http://example.com/admin/feedback.htm

http://example.com/admin/feedback.html

http://example.com/admin/feedback.htm_foo

http://example.com/admin/feedback.htm/bar

...

EHow禁止使用robots.txt抓取谷歌抓取工具抓取的少量网址列表

修改

1 个答案: