需要帮助,将Python与Selenium结合使用以抓取JavaScript网站

时间:2018-06-24 01:35:54

标签: python-3.x selenium

第一次在这里发布,希望我能为您提供足够的细节。

我正在尝试从以下网站链接中抓取: https://www.betbrain.com/baseball/united-states/mlb/

我的Python代码如下:

from selenium import webdriver

delay=10

browser = webdriver.Chrome()
browser.get('https://www.betbrain.com/baseball/united-states/mlb/')
WebDriverWait(browser, delay).until(ec.presence_of_element_located((By.XPATH, '//*[@id="app"]/div/section/section/nav')))


table_check = browser.find_element_by_xpath('//*[@id="app"]/div/section/section/main/div[3]/div[2]/div[2]/div[1]/ul') #find the table containing games 
body_rows = table_check.find_elements_by_xpath('//*[@id="app"]/div/section/section/main/div[3]/div[2]/div[2]/div[1]/ul/li[1]') #find each indvidual game

Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="app"]/div/section/section/main/div[3]/div[2]/div[2]/div[1]/ul"}

当我尝试运行它时,似乎很难找到X_path。有人可以帮助我吗?另外,如果有一种更容易/更稳定的信息选择方式,我愿意放弃Xpath。

提前谢谢!

3 个答案:

答案 0 :(得分:0)

请尝试使用css选择器,而不是较慢且脆弱的xpath。

driver.get('https://www.betbrain.com/baseball/united-states/mlb/')
time.sleep(5)

parent_element = driver.find_element_by_css_selector('div.MatchesListAndHeader > div:nth-child(2) > div:nth-child(1) > ul')

#find all li childs in parent element
child = parent_element.find_elements_by_css_selector('li')

for i in child:
    print(i.text)


driver.quit()

这只是一个简单的脚本,它将以无格式的方式从页面中存在的表中获取表中的所有文本。

我得到的样本输出:

24/06/2018 17:05
Boston Red Sox — Seattle Mariners
United StatesMLB 2018
Home
(1.40)
1.46
1xBet
Away
(2.98)
3.10
Mybet
26
4

United States
MLB 2018
Home
(1.40)
1.46
1xBet

Away
(2.98)
3.10
Mybet
24/06/2018 20:07
Los Angeles Angels — Toronto Blue Jays
United StatesMLB 2018
Over
(1.96)
1.96
1xBet
Under

答案 1 :(得分:0)

您可以尝试以下代码以获取比赛详细信息:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 
import pickle

browser = webdriver.Chrome(executable_path = r'D:/Automation/chromedriver.exe')
browser.get("https://www.betbrain.com/baseball/united-states/mlb/")

wait = WebDriverWait(browser, 30)

wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.MatchesList")))

game_names = browser.find_elements_by_css_selector("ul.MatchesList>li a.MatchTitleLink span")

for game in game_names:
   print(game.text)

答案 2 :(得分:0)

您的XPath不必要地复杂。 使用CSS选择器。我看到您正在尝试获取所有匹配的li。 这个li.Match CSS选择器应该可以做到。

matches = driver.find_elements_by_css_selector("li.Match")

应该给您所有的匹配项。