Question

我正在学习beautifulsoup，并试图写一个小脚本在荷兰房地产网站上找房子。当我尝试获取网站的内容时，我立即收到HTTP405错误：

  File "funda.py", line 2, in <module>
    html = urlopen("http://www.funda.nl")
  File "<folders>request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "<folders>request.py", line 532, in open
    response = meth(req, response)
  File "<folders>request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "<folders>request.py", line 570, in error
    return self._call_chain(*args)
  File "<folders>request.py", line 504, in _call_chain
    result = func(*args)
  File "<folders>request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Not Allowed

我试图执行的内容：

from urllib.request import urlopen
html = urlopen("http://www.funda.nl")

知道为什么会导致HTTP405？我正在做一个GET请求，对吧？

Answer 1

HTTPError: HTTP Error 403: Forbidden可能重复。您需要假冒您是常客。这通常（因站点而异）使用公共/常规User-Agent HTTP标头完成。

>>> url = "http://www.funda.nl"
>>> import urllib.request
>>> req = urllib.request.Request(
...     url, 
...     data=None, 
...     headers={
...         'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
...     }
... )
>>> f = urllib.request.urlopen(req)
>>> f.status, f.msg
(200, 'OK')

使用requests库 -

>>> import requests
>>> response = requests.get(
...     url,
...     headers={
...         'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
...     }
... )
>>> response.status_code
200

Answer 2

如果您不使用Requests或urllib2

，则可以使用

import urllib
html = urllib.urlopen("http://www.funda.nl")

leovp的评论是有道理的。

使用urllib获取网站会导致HTTP 405错误

2 个答案: