第一次在这里发布,希望我能为您提供足够的细节。
我正在尝试从以下网站链接中抓取: https://www.betbrain.com/baseball/united-states/mlb/
我的Python代码如下:
from selenium import webdriver
delay=10
browser = webdriver.Chrome()
browser.get('https://www.betbrain.com/baseball/united-states/mlb/')
WebDriverWait(browser, delay).until(ec.presence_of_element_located((By.XPATH, '//*[@id="app"]/div/section/section/nav')))
table_check = browser.find_element_by_xpath('//*[@id="app"]/div/section/section/main/div[3]/div[2]/div[2]/div[1]/ul') #find the table containing games
body_rows = table_check.find_elements_by_xpath('//*[@id="app"]/div/section/section/main/div[3]/div[2]/div[2]/div[1]/ul/li[1]') #find each indvidual game
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="app"]/div/section/section/main/div[3]/div[2]/div[2]/div[1]/ul"}
当我尝试运行它时,似乎很难找到X_path。有人可以帮助我吗?另外,如果有一种更容易/更稳定的信息选择方式,我愿意放弃Xpath。
提前谢谢!
答案 0 :(得分:0)
请尝试使用css选择器,而不是较慢且脆弱的xpath。
driver.get('https://www.betbrain.com/baseball/united-states/mlb/')
time.sleep(5)
parent_element = driver.find_element_by_css_selector('div.MatchesListAndHeader > div:nth-child(2) > div:nth-child(1) > ul')
#find all li childs in parent element
child = parent_element.find_elements_by_css_selector('li')
for i in child:
print(i.text)
driver.quit()
这只是一个简单的脚本,它将以无格式的方式从页面中存在的表中获取表中的所有文本。
我得到的样本输出:
24/06/2018 17:05
Boston Red Sox — Seattle Mariners
United StatesMLB 2018
Home
(1.40)
1.46
1xBet
Away
(2.98)
3.10
Mybet
26
4
United States
MLB 2018
Home
(1.40)
1.46
1xBet
Away
(2.98)
3.10
Mybet
24/06/2018 20:07
Los Angeles Angels — Toronto Blue Jays
United StatesMLB 2018
Over
(1.96)
1.96
1xBet
Under
答案 1 :(得分:0)
您可以尝试以下代码以获取比赛详细信息:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pickle
browser = webdriver.Chrome(executable_path = r'D:/Automation/chromedriver.exe')
browser.get("https://www.betbrain.com/baseball/united-states/mlb/")
wait = WebDriverWait(browser, 30)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.MatchesList")))
game_names = browser.find_elements_by_css_selector("ul.MatchesList>li a.MatchTitleLink span")
for game in game_names:
print(game.text)
答案 2 :(得分:0)
您的XPath不必要地复杂。
使用CSS选择器。我看到您正在尝试获取所有匹配的li
。
这个li.Match
CSS选择器应该可以做到。
matches = driver.find_elements_by_css_selector("li.Match")
应该给您所有的匹配项。