Question

我正在尝试从this website抓取一些数据，但是似乎找不到任何东西...我尝试存储soup.select或{{ 1}}个为空

当我仅打印soup.find_all时，它不包含这些类，而我也试图找到其中的一个类，所以我想知道我是否缺少基本知识？

soup

Answer 1

这仅仅是因为数据是通过javascript加载的。考虑使用硒

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.oddsportal.com/rugby-union/france/pro-d2/results/'

DRIVER_PATH="Your selenium chrome driver path"
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
seasons = soup.find_all('ul', {'class': 'main-filter'}) # list of links for all seasons
print(seasons)

Answer 2

这是一个js呈现页面。您需要使用硒来使其自动化。您无需手动安装chromedriver，然后再定义excecutable_path的路径。您可以改为使用webdriver_manage模块。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)

url = 'https://www.oddsportal.com/rugby-union/france/pro-d2/results/'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
seasons = soup.find_all('ul', {'class': 'main-filter'}) # list of links for all 
print(seasons)

BeautifulSoup在开发工具中找不到元素

2 个答案: