我使用Selenium抓取网站并获取相关数据。我写了以下脚本:
def scrap_understat(bot_function):
init_db_connection()
init_browser('firefox')
for i in range(7240, 8000):
try:
driver.get('https://understat.com/player/' + str(i))
while not wait_for_element(driver, By.ID, 'header'):
driver.get('https://understat.com/player/' + str(i))
time.sleep(1)
if try_find_Element(driver, By.CLASS_NAME, 'error-code') is not None:
continue
player_name = driver.find_element(By.CLASS_NAME, 'header-wrapper').text
current_team = driver.find_element(By.CLASS_NAME, 'breadcrumb').find_elements(By.TAG_NAME, 'li')[-2].text
data = get_player_data(player_name, current_team)
save_data(data)
except Exception as ex:
log_this(ex)
print(str(ex))
但是,脚本会因第一个“错误”而停止: https://understat.com/player/7245
很显然,此页面包含错误代码类。如果检查页面,则会看到以下
span class =“ error-code”> 404
例如,在https://understat.com/player/7245上不存在此类。我使用continue
命令转到下一个迭代。但是,浏览器在https://understat.com/player/7245上冻结。我究竟做错了什么?
预先感谢