我正在尝试从this website抓取一些数据,但是似乎找不到任何东西...我尝试存储soup.select
或{{ 1}}个为空
当我仅打印soup.find_all
时,它不包含这些类,而我也试图找到其中的一个类,所以我想知道我是否缺少基本知识?
soup
答案 0 :(得分:0)
这仅仅是因为数据是通过javascript加载的。考虑使用硒
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://www.oddsportal.com/rugby-union/france/pro-d2/results/'
DRIVER_PATH="Your selenium chrome driver path"
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
seasons = soup.find_all('ul', {'class': 'main-filter'}) # list of links for all seasons
print(seasons)
答案 1 :(得分:0)
这是一个js呈现页面。您需要使用硒来使其自动化。您无需手动安装chromedriver,然后再定义excecutable_path的路径。您可以改为使用webdriver_manage
模块。
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
url = 'https://www.oddsportal.com/rugby-union/france/pro-d2/results/'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
seasons = soup.find_all('ul', {'class': 'main-filter'}) # list of links for all
print(seasons)