Question

几周前我写的一个简单的Web抓取代码不断出现以下错误： HTTP错误429：请求过多该代码旨在从excel文件获取输入，并在线查找和下载pdf。我对请求不太熟悉，但是我放慢了请求的数量，以查看它可以处理多少个请求。似乎这是一个无关紧要的问题。无论我坐的延迟是5秒钟还是20秒钟，代码都会经过相似数量的输入（大约30个）。这是不断出现的错误消息：

Traceback (most recent call last):
  File "D:\Python\New folder\Web Scraper.py", line 17, in <module>
    for url in search(searchquery, stop=1, pause=2):
  File "D:\Python\lib\site-packages\google-2.0.2-py3.7.egg\googlesearch\__init__.py", line 288, in search
    html = get_page(url, user_agent)
  File "D:\Python\lib\site-packages\google-2.0.2-py3.7.egg\googlesearch\__init__.py", line 154, in get_page
    response = urlopen(request)
  File "D:\Python\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "D:\Python\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "D:\Python\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "D:\Python\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "D:\Python\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "D:\Python\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "D:\Python\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "D:\Python\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "D:\Python\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "D:\Python\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "D:\Python\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests

这是我编写的代码：

import xlrd, requests
from googlesearch import search
from time import sleep

xlloc = ("D:/VesselBase.xlsx")
#Excel location
ws = xlrd.open_workbook(xlloc)
sheet = ws.sheet_by_index(0)
#Sheet name/index
sheet.cell_value(0, 0)
for i in range(sheet.nrows):
    vesselname = sheet.cell_value(i, 1)
    vesselimo = sheet.cell_value(i,0)
    #Which column/row to choose, 2nd column for vessels. 0=A/1.
    searchquery = 'Vessel specification information "%s" OR "%s" filetype:pdf' % (vesselname, vesselimo)
    print('Searching "%s"' % searchquery)
    for url in search(searchquery, stop=1, pause=20):
        print('Searched for %s' % vesselname)
        print('Found %s' % url)
        open('D:/Newfolder/%s.pdf' % vesselname, 'wb').write(requests.get(url).content)
        #Where to save
        print('Saved %s' % vesselname)

持续收到“ HTTP错误429：请求过多”

0 个答案: