无法捕获HTTP错误500:内部服务器错误

时间:2019-01-30 03:49:49

标签: python python-3.x for-loop try-catch

我无法将try & except功能集成到我的循环中。如果您想重现该错误,请参见以下代码:

import datetime
import pandas as pd
import urllib.request
from urllib.error import HTTPError

start = datetime.datetime.strptime("19-09-2016", "%d-%m-%Y")
end = datetime.datetime.strptime("31-12-2017", "%d-%m-%Y")
date_generated = [start + datetime.timedelta(days = x) for x in range(0, (end - start).days)]

dates_list = []
for date in date_generated:
    txt = str(str(date.day) + '.' + str(date.month) + '.' + str(date.year))
    dates_list.append(txt)

ndf = pd.DataFrame()  # create empty ndf
for i in range(0, len(dates_list)):
    allURL = 'https://www.uzse.uz/trade_results?date=' + dates_list[i] + '&locale=en&mkt_id=ALL&page=%d'

    for k in range(1, 100):
        url = allURL % k

        errors = []
        try:
            pd.read_html(url)[0].empty
        except HTTPError:
            errors.append(url)

        if pd.read_html(url)[0].empty:
            break
        else:
            chunk = pd.read_html(url)[0]
            chunk['Date'] = dates_list[i] # Date is positioned at last position, let's fix that
            cols = chunk.columns.tolist() # get a list of all the columns
            cols = cols[-1:] + cols[:-1] # rearrange the columns, move the last element (Date) to the first position
            chunk = chunk[cols] # reorder the dataframe
            ndf = pd.concat([ndf, chunk])

print(ndf)

我试图以多种方式修改try & except。但是我无法使其正常工作……我还想存储所有这些损坏的URL,以进行进一步的手动检查。上面的代码报告了这一点:

HTTPError                                 Traceback (most recent call last)
<ipython-input-6-31cafbad5945> in <module>()
     26             errors.append(url)
     27 
---> 28         if pd.read_html(url)[0].empty:
     29             break
     30         else:

 346     # this version of raise is a syntax error in Python 3

HTTPError: HTTP Error 500: Internal Server Error 

1 个答案:

答案 0 :(得分:1)

由于pd.read_html(url)行中的第二个if pd.read_html(url)[0].empty:,触发了错误。第一个HTTPErrortry/except处理,第二次您调用的URL以前没有保护措施而失败。

其次,由于break errors仅具有1个条目。我不确定您要保存所有失败的ULR还是仅保存循环中的那些ULR。

尝试一下。

import datetime
import pandas as pd
from urllib.error import HTTPError

start = datetime.datetime.strptime("19-09-2016", "%d-%m-%Y")
end = datetime.datetime.strptime("31-12-2017", "%d-%m-%Y")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end - start).days)]

dates_list = []
for date in date_generated:
    txt = str(str(date.day) + '.' + str(date.month) + '.' + str(date.year))
    dates_list.append(txt)

ndf = pd.DataFrame()  # create empty ndf
for i in range(0, len(dates_list)):
    allURL = 'https://www.uzse.uz/trade1_results?date=' + dates_list[i] + '&locale=en&mkt_id=ALL&page=%d'
    errors = []

    for k in range(1, 100):
        url = allURL % k
        try:
            chunk = pd.read_html(url)[0]
            chunk['Date'] = dates_list[i]  # Date is positioned at last position, let's fix that
            cols = chunk.columns.tolist()  # get a list of all the columns
            cols = cols[-1:] + cols[:-1]  # rearrange the columns, move the last element (Date) to the first position
            chunk = chunk[cols]  # reorder the dataframe
            ndf = pd.concat([ndf, chunk])
        except HTTPError:
            errors.append(url)

    print(errors)
print(ndf)