我正在尝试使用Selenium和 BeautifulSoup 来“点击” javascript.void
。 find_element_by_link_text
的返回值不是 NULL 。但是,通过查看browser.page_source
不会更新任何内容。我不确定抓取是否成功
这是使用
的结果PageTable = soup.find('table',{'id':'rzrqjyzlTable'})
print(PageTable)
<table class="tab1" id="rzrqjyzlTable">
<div id="PageNav" class="PageNav" style="">
<div class="Page" id="PageCont">
<a href="javascript:void(0);" target="_self" class="nolink">Previous</a>3<span class="at">1</span>
<a href="javascript:void(0);" target="_self" title="Page 2">2</a>
<a href="javascript:void(0);" target="_self" title="Page 3">3</a>
<a href="javascript:void(0);" target="_self" title="Page 4">4</a>
<a href="javascript:void(0);" target="_self" title="Page 5">5</a>
<a href="javascript:void(0);" target="_self" title="Next group" class="next">...</a>
<a href="javascript:void(0);" target="_self" title="Last Page">45</a>
<a href="javascript:void(0);" target="_self" title="Page 2">Next Page</a>
<span class="txt"> Jump</span><input class="txt" id="PageContgopage">
<a class="btn_link">Go</a></div>
</div>
点击下一页的代码如下所示
try:
page = browser.find_element_by_link_text(u'Next Page')
page.click()
browser.implicitly_wait(3)
except NoSuchElementException:
print("NoSuchElementException")
soup = BeautifulSoup(browser.page_source, 'html.parser')
PageTable = soup.find('table',{'id':'rzrqjyzlTable'})
print(PageTable )
我希望应该更新browser.page_source
答案 0 :(得分:0)
单击下一页后,您可以重新加载网页。
代码:
driver.refresh()
或使用Java脚本执行器:
driver.execute_script("location.reload()")
之后,您尝试像执行操作一样获取页面源。
希望这会有所帮助。
答案 1 :(得分:-1)
我的猜测是您要在重新加载页面(或子页面)之前提取源。我会尝试抓住“下一页”按钮,单击它,等待它过时(表明该页面正在重新加载),然后尝试拉出源代码。
page = browser.find_element_by_link_text(u'Next Page')
page.click()
wait.until(EC.staleness_of(page))
# the page should be loading/loaded at this point
# you may need to wait for a specific element to appear to ensure that it's loaded properly since it doesn't seem to be a full page load