使用Selenium Python进行网络抓取

时间:2020-08-12 08:34:42

标签: python selenium selenium-webdriver xpath web-scraping

我会在此页面https://www.betexplorer.com/soccer/russia/premier-league/results/中抓取开局赔率 我尝试了这段代码:

   try:
        driver.find_element_by_xpath("//td[a[.='bet365']]/following-sibling::td[span]")
    except NoSuchElementException:
        homeodd = 'no bet365 odd'
        drawodd = 'no bet365 odd'
        awayodd = 'no bet365 odd'
    else:
        driver.find_element_by_xpath("//*[@id='sortable-1']/tbody/tr[6]/td[4]").click()
        sleep(3)
        homeodd = driver.find_element_by_xpath("//table[ends-with(@id,'16')]//tr[th='Opening odds']/following-sibling::tr/td[@class='bold']").text
        print(homeodd)
        driver.find_element_by_xpath("//*[@id='sortable-1']/tbody/tr[6]/td[5]").click()
        sleep(3)
        drawodd = driver.find_element_by_xpath("//table[ends-with(@id,'16')]//tr[th='Opening odds']/following-sibling::tr/td[@class='bold']").text
        print(drawodd)
        driver.find_element_by_xpath("//*[@id='sortable-1']/tbody/tr[6]/td[6]").click()
        sleep(3)
        awayodd = driver.find_element_by_xpath("//table[ends-with(@id,'16')]//tr[th='Opening odds']/following-sibling::tr/td[@class='bold']").text
        print(awayodd)

我有此错误:SyntaxError:无法在'Document'上执行'evaluate':字符串'// table [ends-with(@ id,'16')] // tr [th ='Openingodds' ] / following-sibling :: tr / td [@ class ='bold']'不是有效的XPath表达式。 但是我错了xpath语法。

问题在于,此页面中没有属性data-opening-odd。我在上一篇文章中问了另外一个scraping with selenium web driver,并在社区的一个很好的建议下,我找到了一个很好的解决方案

try:
        driver.find_element_by_xpath("//td[a[.='bet365']]/following-sibling::td[span]")
    except NoSuchElementException:
        homeodd = 'no bet365 odd'
        drawodd = 'no bet365 odd'
        awayodd = 'no bet365 odd'
    else:
        homeodd = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//td[a[.="bet365"]]/following-sibling::td[span][1]'))).get_attribute("data-opening-odd")
        drawodd = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//td[a[.="bet365"]]/following-sibling::td[span][2]'))).get_attribute("data-opening-odd")
        awayodd = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//td[a[.="bet365"]]/following-sibling::td[span][3]'))).get_attribute("data-opening-odd")
    

有人建议采用第二种解决方案,而不会从属性data-opening-odd中获得赔率吗?谢谢

2 个答案:

答案 0 :(得分:0)

ends-with函数来自XPath 2.0,但浏览器仍仅支持XPath 1.0。 最简单的方法是更换

table[ends-with(@id,'16')]

使用

table[contains(@id,'16')]

或更复杂的表达方式:

table[substring(@id, string-length(@id) - string-length('16') + 1) = '16']

答案 1 :(得分:-1)

一个简单的解决方案是使用一个名为Parse Hub的软件:https://www.parsehub.com/quickstart

您可以使用它来剪贴任何网站的数据!它真的很容易使用如果您需要使用它方面的任何帮助,请告诉我,请您帮个忙,但我认为YouTube教程可能足以了解如何使用它!