当页面源未完全显示时,使用Selenium WebDriver提取文本

时间:2017-08-09 22:02:43

标签: python selenium selenium-webdriver webdriver automated-tests

如果页面源未完全显示,如何使用Selenium WebDriver提取文本? 或

你如何使用你的工具处理它?<​​/ p>

看起来这个网站已经阻止了手动提取字段,

并且只允许手动下载预处理的CSV文件。

当我尝试检查页面源(Ctrl + U或Strg + U)中的搜索数据时,我看不到该表。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from time import sleep

# driver = webdriver.PhantomJS()
driver = webdriver.Chrome()

driverurl = "https://officialrecords.broward.org/AcclaimWeb"

driver.get(driverurl)
driver.find_element_by_id("btnButton").click()

Name = "John"
DocType = "DEED TRANSFERS OF REAL PROPERTY (D)"
RecordDate = "8/1/2017"

driver.find_element_by_id("SearchOnName").send_keys(Name)

driver.find_element_by_id("DocTypesDisplay-input").clear()
sleep(1)
driver.find_element_by_id("DocTypesDisplay-input").send_keys(DocType)

driver.find_element_by_id("RecordDateFrom").clear()
driver.find_element_by_id("RecordDateFrom").send_keys(RecordDate)
driver.find_element_by_id("RecordDateTo").clear()
driver.find_element_by_id("RecordDateTo").send_keys(RecordDate)

driver.find_element_by_id("btnSearch").click()

html = driver.page_source 
soup = BeautifulSoup(html, "lxml")

Row = soup.findAll("tr", { "class" : "t-state-selected" })[0].findAll("td")
SearchedName = Row[1].get_text()
RecordDate   = Row[5].get_text()
print SearchedName
print RecordDate

# driver.close()
  

回溯(最近一次调用最后一次):文件“test.py”,第34行,in          Row = soup.findAll(“tr”,{“class”:“t-state-selected”})[0] .findAll(“td”)IndexError:列表索引超出范围

列表中有更多元素

我想自动保存生成的CSV文件,或编写机器人来浏览所有行并保存到我的csv文件。

我简化了问题,只询问如何从第1行中提取任何1-2个字段。

打开https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeName

https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeDocT ype

并看到。

1 个答案:

答案 0 :(得分:1)

我收到错误的两件事:

selenium.common.exceptions.ElementNotVisibleException: Message: element not visible

我在现有行周围添加了两行代码来修复它:

...
driver.execute_script("return arguments[0].scrollIntoView();", driver.find_element_by_id("btnSearch"))
driver.find_element_by_id("btnSearch").click()
sleep(10) # maybe it doesn't need this long
....

另外在我看来,在点击之前,类t-state-selected不会应用于元素,所以我改变了:

Row = soup.findAll("tr", { "class" : "t-state-selected" })[0].findAll("td")

Row = soup.findAll("tr")[1].findAll("td")

现在输出:

JOHNSON,BRIAN S
08/01/2017 02:05:25