使用机械化错误代码进行网页抓取

时间:2014-02-23 11:24:42

标签: python mechanize

import mechanize
br=mechanize.Browser()
r=br.open("http://www.drugs.com/search-wildcard-phonetic.html")
br.select_form(nr=0)
br.form['searchterm']='panadol'
br.submit()
print br.response().read()

error in above code:
Traceback (most recent call last):
  File "mech2.py", line 6, in <module>
    br.submit()
  File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 541, in submit
    return self.open(self.click(*args, **kwds))
  File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
    raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

请帮助纠正上述代码

1 个答案:

答案 0 :(得分:0)

您的代码看起来没有任何问题。您的错误消息

  

mechanise._response.httperror_seek_wrapper:HTTP错误403:robots.txt禁止请求

您似乎违反了robots.txt文件。如果您不希望看到此错误消息,请停止滥用此网站,并可能与他们联系以获取使用其数据的可接受方式。