Question

我试图使用beautifulSoup和Python来搜索以下链接。

my_url = 'https://www.idealo.de/preisvergleich/OffersOfProduct/5577980_-macbook-air-13-2017-apple.html'

问题是每次我尝试刮取页面时服务器都返回错误429。其中一个建议是使用标题信息，但它不起作用。我使用的代码如下所示。

hdr = { 'User-Agent' : 'text for header' }
req = Request(my_url,data=None, headers=hdr)
try:
    page=urlopen(req)
except HTTPError as e:
    print('The server couldn\'t fulfill the request.')
    print( 'Error code: ', e.code)
except URLError as e:
    print('We failed to reach a server.')
    print('Reason: ', e.reason)
else:
    # everything is fine
    print("All good")

在这种情况下，我有哪些选择？谢谢

Answer 1

requests和urllib.request.urlopen一切正常。

from urllib.request import urlopen
import requests

url = 'https://www.idealo.de/preisvergleich/OffersOfProduct/5577980_-macbook-air-13-2017-apple.html'

>>> r = requests.get(url)
>>> r.status_code
200

>>> with urlopen(url) as response:
...     print(response.status)
200

请注意，“接收状态429不是错误”。您可以阅读更多相关信息here。

太多请求HTTP错误429

1 个答案: