您好我正在练习从网站上提取信息。
(我正在使用python
,selenium
和beautifulsoup
,这并不重要。问题是如何在HTML中查找元素。)
所以(1)我想在图表中的信息。我使用Firefox Inspector
找到了表:<table id='......'>
(2)但在我的代码中我找不到它:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://corp.sec.state.ma.us/corpweb/UCCSearch/UCCSearch.aspx'
driver = webdriver.Firefox()
driver.get(url)
# navigate to the page I want using selenium
driver.find_element_by_id("MainContent_rdoSearchO").click()
driver.find_element_by_id("MainContent_txtName").send_keys("mcdonald")
Select(driver.find_element_by_id("MainContent_cboOState")).select_by_visible_text("Massachusetts")
Select(driver.find_element_by_id("MainContent_UCCSearchMethodO")).select_by_visible_text("Begins With")
driver.find_element_by_id("MainContent_btnSearch").click()
# now on next page, click link (selenium)
link_text = '95352026'
driver.find_element_by_link_text(link_text).click()
### real question starts here:
# now on the page I want
# in firefox inspector find: <table id="MainContent_tblFilingHistory">
table_id = 'MainContent_tblFilingHistory'
# try find it
table = driver.find_elements_by_id(table_id)
len(table) # length = 0, can't find it
html.find(table_id) # -1, HTML really doesn't have this string
答案 0 :(得分:1)
您无法找到的元素位于另一个窗口中。您需要告诉驱动程序将上下文切换到该窗口:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
driver = webdriver.Firefox()
driver.get('http://corp.sec.state.ma.us/corpweb/UCCSearch/UCCSearch.aspx')
driver.find_element_by_id("MainContent_rdoSearchO").click()
driver.find_element_by_id("MainContent_txtName").send_keys("mcdonald")
Select(driver.find_element_by_id("MainContent_cboOState")).select_by_visible_text("Massachusetts")
Select(driver.find_element_by_id("MainContent_UCCSearchMethodO")).select_by_visible_text("Begins With")
driver.find_element_by_id("MainContent_btnSearch").click()
driver.find_element_by_link_text('95352026').click()
#switch to the next window
driver.switch_to_window(driver.window_handles[1])
table = driver.find_elements_by_id('MainContent_tblFilingHistory')