从FlashScore.ro实时抓取信息

时间:2020-10-25 19:30:58

标签: python web-scraping beautifulsoup python-requests

我正在尝试从实时标签中从该网站https://www.flashscore.ro/baschet/抓取信息。每当发生任何事情,我都希望收到一封电子邮件。

但是我的问题是刮擦 我到目前为止的代码返回None。我现在想知道主队的名字。

对于使用python进行抓取我是新手

import requests 
from bs4 import BeautifulSoup
    
URL = 'https://www.flashscore.ro/baschet/'    
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'}


def find_price():    
    page = requests.get(URL, headers = headers)    
    soup = BeautifulSoup(page.content, 'html.parser')    

    home_team = soup.html.find('div', {'class': 'event__participant event__participant--home'})
    return home_team    

print(find_price())

1 个答案:

答案 0 :(得分:1)

该网站使用JavaScript,但requests不支持它。因此我们可以使用Selenium作为刮刮页面的替代方法。

通过以下方式安装:pip install selenium

here下载正确的ChromeDriver。

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

URL = "https://www.flashscore.ro/baschet/"

driver = webdriver.Chrome(r"C:\path\to\chromedriver.exe")
driver.get(URL)
# Wait for page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")

for tag in soup.find_all(
    "div", {"class": "event__participant event__participant--home"}
):
    print(tag.text)


driver.quit()

输出:

Lyon-Villeurbanne
Fortitudo Bologna
Virtus Roma
Treviso
Trieste
Trento
Unicaja
Gran Canaria
Galatasaray
Horizont Minsk 2 F
...And on