Python中Selenium属性提取的麻烦

时间:2017-07-03 16:25:30

标签: python selenium

我正在尝试从Selenium页面获取链接。代码如下:

link = 'http://cancer.sanger.ac.uk/cosmic/sample/overview?id=2120881'
driver = webdriver.Chrome()
driver.get(link)
elem = driver.find_element_by_link_text("Variants")
elem.click()
time.sleep(2) # wait to load
elems = driver.find_elements_by_xpath("//table[@id='DataTables_Table_0']/tbody/tr[3]/td")
elem = elems[4]
print(elem.get_property('href'))
print(elem.get_attribute("href"))
print(elem.text)

为什么获取href时结果为None?我如何获得此链接?

提前致谢!

1 个答案:

答案 0 :(得分:2)

你的脚本工作正常。问题出在您的xpath。如果您需要a元素,则需要找到它们,而不是外部td。变化

elems = driver.find_elements_by_xpath("//table[@id='DataTables_Table_0']/tbody/tr[3]/td")

elems = driver.find_elements_by_xpath("//table[@id='DataTables_Table_0']/tbody/tr[3]/td/a")

(在/a之后注意/td

还有一个提示: 如果您检查Variants按钮,则可以看到它还有一个网址:http://cancer.sanger.ac.uk/cosmic/sample/overview?id=2120881#datatab。您只需要在最后添加#datatab

说到你的最终剧本应该是这样的:

from selenium import webdriver

link = 'http://cancer.sanger.ac.uk/cosmic/sample/overview?id=2120881#datatab'
driver = webdriver.Chrome()
driver.get(link)
elems = driver.find_elements_by_xpath(
    "//table[@id='DataTables_Table_0']/tbody/tr[3]/td/a")
elem = elems[4]
print(elem.get_property('href'))
print(elem.get_attribute("href"))
print(elem.text)