我正在制作一个使用谷歌搜索的程序,但我不能解决HTTP错误403是否有任何方法围绕它或任何我使用机械化浏览这里是我的代码
from mechanize import Browser
inp = raw_input("Enter Word: ")
Word = inp
SEARCH_PAGE = "https://www.google.com/"
browser = Browser()
browser.open( SEARCH_PAGE )
browser.select_form( nr=0 )
browser['q'] = Word
browser.submit()
这是错误消息
Traceback (most recent call last):
File "C:\Python27\Project\Auth2.py", line 16, in <module>
browser.submit()
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 541, in submit
return self.open(self.click(*args, **kwds))
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 255, in _mech_open
raise response
httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt
请帮助并谢谢
答案 0 :(得分:6)
您可以告诉Mechanize忽略robots.txt
文件:
browser.set_handle_robots(False)
答案 1 :(得分:2)
Mechanize尝试尊重网站上/robots.txt
文件宣布的抓取限制。在此,Google不希望抓取工具为其搜索网页编制索引。
您可以忽略此限制:
browser.set_handle_robots(False)
如Web Crawler - Ignore Robots.txt file?
中所述另外,我建议使用Google's Custom Search API代替,它会使用易于解析的结果公开适当的API。