如何使用Beautiful Soup和Requests解决“ Exceeded 30 redirects”错误

时间:2019-07-14 13:32:37

标签: python python-3.x web-scraping beautifulsoup python-requests

我正在网络抓取Jupyter笔记本中的API目录。我的代码运行良好,抓住了名称,URL,类别和描述,直到运行了几分钟,然后弹出了一条错误消息,提示“ TooManyRedirects:超过30个重定向”。

这是我第一次尝试此操作,我不确定如何解决此问题。

这是我完整的代码:

from bs4 import BeautifulSoup
import requests
import pandas as pd

d = {'key':'value'}
print(d)

d['new key'] = 'new value'
print(d)

API = {}
API_no = 0

while True:
    response = requests.get(url)
    data = response.text
    soup = BeautifulSoup(data,'html.parser')
    interfaces = soup.find_all('tr',{'class':['even','odd']})

    for interface in interfaces:
        api_title = interface.find('td',{'class':'views-field views-field-title col-md-3'}).text
        api_url = 'https://www.programmableweb.com' + interface.find('a').get('href')
        api_category_tag = interface.find('td',{'class':'views-field views-field-field-article-primary-category'})
        api_category = api_category_tag.text if api_category_tag else "N/A"
        ​
        interface_response = requests.get(api_url)
        interface_data = interface_response.text
        interface_soup = BeautifulSoup(interface_data,'html.parser')
        interface_description_tag = interface_soup.find('div',{'class':'api_description tabs-header_description'})
        interface_description = interface_description_tag.text if interface_description_tag else "N/A"


        API_no+=1
        API[API_no] = [api_title, api_url, api_category, interface_description]

        print('API Name:', api_title, '\nURL:', api_url, '\nAPI Category:', api_category, '\nDescription:', interface_description, '\n---')
    url_tag = soup.find('a',{'title':'Go to next page'})
    if url_tag.get('href'):
        url = "https://www.programmableweb.com" + url_tag.get('href')
        print(url)
    else:
        break

print('Total APIs:', API_no)
API_df = pd.DataFrame.from_dict(API, orient = 'index', columns = ['API Name', 'URL', 'API Category', 'Description'])

我希望代码最终能够通过21,000个API,但由于重定向错误消息而停止。有关如何解决此问题的任何建议。

0 个答案:

没有答案