在Python中使用硒进行抓取时访问被拒绝

时间:2020-02-04 19:26:18

标签: python selenium web-scraping

是否可以解决在fantasy.premierleague.com上使用此代码时遇到的“拒绝访问”问题?我尝试使用一些随机的时间延迟,但这似乎没有任何效果。

import requests 
import time
from bs4 import BeautifulSoup 
from selenium import webdriver

driver = webdriver.Firefox(executable_path=r'C:\Users\benja\Desktop\geckodriver.exe')

for y in range(7, 9):
    driver.get('https://fantasy.premierleague.com/leagues/181/standings/c?phase=1&page_new_entries=1&page_standings='+str(y))
    html = driver.execute_script("return document.documentElement.outerHTML")
    sel_soup = BeautifulSoup(html, 'html.parser')
    leaderboard = sel_soup.find('table', { 'class': 'Table-ziussd-1 hOInPp' })
    tbody = leaderboard.find('tbody')

    for tr in tbody.find_all('tr', {'class': 'StandingsRow-fwk48s-0 jRzimt'}):
        navn = tr.find_all('td')[1].find_all('a')[0].text.strip()
        link = tr.find_all('td')[1].find_all('a')[0]['href']
        print(navn, link)
    time.sleep(random.randint(5, 10))

0 个答案:

没有答案