我正在网络抓取Jupyter笔记本中的API目录。我的代码运行良好,抓住了名称,URL,类别和描述,直到运行了几分钟,然后弹出了一条错误消息,提示“ TooManyRedirects:超过30个重定向”。
这是我第一次尝试此操作,我不确定如何解决此问题。
这是我完整的代码:
from bs4 import BeautifulSoup
import requests
import pandas as pd
d = {'key':'value'}
print(d)
d['new key'] = 'new value'
print(d)
API = {}
API_no = 0
while True:
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data,'html.parser')
interfaces = soup.find_all('tr',{'class':['even','odd']})
for interface in interfaces:
api_title = interface.find('td',{'class':'views-field views-field-title col-md-3'}).text
api_url = 'https://www.programmableweb.com' + interface.find('a').get('href')
api_category_tag = interface.find('td',{'class':'views-field views-field-field-article-primary-category'})
api_category = api_category_tag.text if api_category_tag else "N/A"
interface_response = requests.get(api_url)
interface_data = interface_response.text
interface_soup = BeautifulSoup(interface_data,'html.parser')
interface_description_tag = interface_soup.find('div',{'class':'api_description tabs-header_description'})
interface_description = interface_description_tag.text if interface_description_tag else "N/A"
API_no+=1
API[API_no] = [api_title, api_url, api_category, interface_description]
print('API Name:', api_title, '\nURL:', api_url, '\nAPI Category:', api_category, '\nDescription:', interface_description, '\n---')
url_tag = soup.find('a',{'title':'Go to next page'})
if url_tag.get('href'):
url = "https://www.programmableweb.com" + url_tag.get('href')
print(url)
else:
break
print('Total APIs:', API_no)
API_df = pd.DataFrame.from_dict(API, orient = 'index', columns = ['API Name', 'URL', 'API Category', 'Description'])
我希望代码最终能够通过21,000个API,但由于重定向错误消息而停止。有关如何解决此问题的任何建议。