Python与selenium webscraping无法找到元素

时间:2016-10-28 03:34:29

标签: python selenium web-scraping phantomjs

我正在尝试在python中编写一个webscraping来激活" onclick"网页上某些按钮的功能,因为包含我想要的数据的表格会转换为csv,这使得访问更加容易。但问题是,当使用PhantomJs时,我无法通过xpath找到元素。如何单击元素并访问我想要的csv内容?

这是我的代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By

from selenium.webdriver.common.proxy import *

url = "http://www.pro-football-reference.com/boxscores/201609180nwe.htm"
xpath = "//*[@id='all_player_offense']/div[1]/div/ul/li[1]/div/ul/li[3]/button"

path_to_phantomjs = 'browser/phantomjs'
browser = webdriver.PhantomJS(executable_path = path_to_phantomjs)
browser.get(url)

delay=3
element_present = EC.presence_of_element_located((By.ID, 'all_player_offense'))
WebDriverWait(browser, delay).until(element_present)

browser.find_element_by_xpath(xpath).click()

我收到了这个错误:

selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '//*[@id='all_player_offense']/div[1]/div/ul/li[1]/div/ul/li[3]/button'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"153","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:50989","User-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"sessionId\": \"93ff24f0-9cbe-11e6-8711-bdfa3ff9cfb1\", \"value\": \"//*[@id='all_player_offense']/div[1]/div/ul/li[1]/div/ul/li[3]/button\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/93ff24f0-9cbe-11e6-8711-bdfa3ff9cfb1/element"}}
Screenshot: available via screen

1 个答案:

答案 0 :(得分:1)

  

重要提示我注意:如本this issue on GitHub所述,请在设置网络驱动程序后尝试set_window_size(width, height)maximize_window()。您还应该考虑告诉webdriver implicitly_wait(10)元素出现。

因此,为了让Selenium Webdriver能够正确模拟您正在做的事情,您必须执行一项特殊操作。从本质上讲,要获得所需的数据,您必须:

A 将鼠标悬停在。然后

B 点击“将表格设为CSV(Excel)”。

对于 A ,这涉及必须将模拟光标放在元素上而不单击它。这种“鼠标悬停”的想法可以使用move_to_element()类中提供的ActionChains函数来完成。所以在顶部你会插入这个:

from selenium.webdriver.common.action_chains import ActionChains

您希望Selenium找到特定元素并移动到它。您可以使用2行代码实现此目的:

dropdown = browser.find_element_by_xpath('//*[@id="all_player_offense"]/div[1]/div/ul/li[1]')
ActionChains(browser).move_to_element(dropdown).perform()

如果省略上述内容,您将获得ElementNotVisibleException

现在 B ,您应该可以browser.find_element_by_xpath(xpath).click()