如果页面源未完全显示,如何使用Selenium WebDriver提取文本? 或
你如何使用你的工具处理它?</ p>
看起来这个网站已经阻止了手动提取字段,
并且只允许手动下载预处理的CSV文件。
当我尝试检查页面源(Ctrl + U或Strg + U)中的搜索数据时,我看不到该表。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from time import sleep
# driver = webdriver.PhantomJS()
driver = webdriver.Chrome()
driverurl = "https://officialrecords.broward.org/AcclaimWeb"
driver.get(driverurl)
driver.find_element_by_id("btnButton").click()
Name = "John"
DocType = "DEED TRANSFERS OF REAL PROPERTY (D)"
RecordDate = "8/1/2017"
driver.find_element_by_id("SearchOnName").send_keys(Name)
driver.find_element_by_id("DocTypesDisplay-input").clear()
sleep(1)
driver.find_element_by_id("DocTypesDisplay-input").send_keys(DocType)
driver.find_element_by_id("RecordDateFrom").clear()
driver.find_element_by_id("RecordDateFrom").send_keys(RecordDate)
driver.find_element_by_id("RecordDateTo").clear()
driver.find_element_by_id("RecordDateTo").send_keys(RecordDate)
driver.find_element_by_id("btnSearch").click()
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
Row = soup.findAll("tr", { "class" : "t-state-selected" })[0].findAll("td")
SearchedName = Row[1].get_text()
RecordDate = Row[5].get_text()
print SearchedName
print RecordDate
# driver.close()
回溯(最近一次调用最后一次):文件“test.py”,第34行,in Row = soup.findAll(“tr”,{“class”:“t-state-selected”})[0] .findAll(“td”)IndexError:列表索引超出范围
列表中有更多元素
我想自动保存生成的CSV文件,或编写机器人来浏览所有行并保存到我的csv文件。
我简化了问题,只询问如何从第1行中提取任何1-2个字段。
打开https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeName
或
https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeDocT ype
并看到。
答案 0 :(得分:1)
我收到错误的两件事:
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
我在现有行周围添加了两行代码来修复它:
...
driver.execute_script("return arguments[0].scrollIntoView();", driver.find_element_by_id("btnSearch"))
driver.find_element_by_id("btnSearch").click()
sleep(10) # maybe it doesn't need this long
....
另外在我看来,在点击之前,类t-state-selected不会应用于元素,所以我改变了:
Row = soup.findAll("tr", { "class" : "t-state-selected" })[0].findAll("td")
到
Row = soup.findAll("tr")[1].findAll("td")
现在输出:
JOHNSON,BRIAN S
08/01/2017 02:05:25