我仍然在拼抢这个网站http://bombayhighcourt.nic.in/party_query.php。
我希望用户使用所有可能的下拉值组合来选择它。
所以这是我的代码的重要部分:
class Scraper(webdriver.Chrome):
def __init__(self, url):
self.url = url
self.mydriver = webdriver.Chrome()
self.mydriver.get(self.url)
def chooseDropdownOption(self, xpath, option):
dropdown = self.mydriver.find_element_by_xpath(xpath)
Select(dropdown).select_by_value(option)
def getDropdownOptions(self, xpath):
dropdown = self.mydriver.find_element_by_xpath(xpath)
return UI.Select(dropdown).options
(...)
#now I extract all options from each dropdown:
bench_options = s.getDropdownOptions("/html/body/form/table[2]/tbody/tr/td[2]/select")
jurisd_options = s.getDropdownOptions("/html/body/form/table[2]/tbody/tr/td[4]/select")
petition_options = s.getDropdownOptions("/html/body/form/table[4]/tbody/tr/td[2]/select")
#and now I am looping through each list:
for bench_option in bench_options:
#LOOP 1
current_bench_option = bench_option.get_attribute('value')
for jurisd_option in jurisd_options:
#LOOP 2
current_jurisd_option = jurisd_option.get_attribute("value")
for petition_option in petition_options:
#LOOP 3
current_petition_option = petition_option.get_attribute("value")
for year in range(year_start, year_final+1):
#LOOP 4
s.chooseDropdownOption("/html/body/form/table[2]/tbody/tr/td[2]/select", current_bench_option)
s.chooseDropdownOption("/html/body/form/table[2]/tbody/tr/td[4]/select", current_jurisd_option)
s.inputText("/html/body/form/table[3]/tbody/tr/td[2]/input[2]", name)
s.chooseDropdownOption("/html/body/form/table[4]/tbody/tr/td[2]/select", current_petition_option)
s.chooseDropdownOption("/html/body/form/table[4]/tbody/tr/td[4]/select", str(year))
s.clickButton("/html/body/form/table[5]/tbody/tr[1]/td/input[1]")
#DO SCRAPING PART
#GO BACK to THE SEARCH PAGE
我不知道是否有更干净的方式来完成这类任务,但这不是我的问题。我对此程序的理解如下:
但是在年份范围耗尽之后,它会正确返回搜索页面,但我收到此错误消息:
---------------------------------------------------------------------------
StaleElementReferenceException Traceback (most recent call last)
<ipython-input-34-09f2da0eea05> in <module>()
27
28 for petition_option in petition_options:
---> 29 current_petition_option = petition_option.get_attribute("value")
30 print(current_petition_option)
31
(...)
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=55.0.2883.87)
(Driver info: chromedriver=2.26.436362 (5476ec6bf7ccbada1734a0cdec7d570bb042aa30),platform=Windows NT 6.1.7601 SP1 x86_64)
我的理解有些失败,但我找不到......
答案 0 :(得分:0)
我不知道为什么,但经常向SO发帖提问对我的大脑产生了惊人的影响,我立即找到解决方案:)当我从下拉列表中获取选项时,我应该立即将它们转换为值。就是这样。
bench_options = s.getDropdownOptions("/html/body/form/table[2]/tbody/tr/td[2]/select")
bench_options = [x.get_attribute("value") for x in bench_options]
jurisd_options = s.getDropdownOptions("/html/body/form/table[2]/tbody/tr/td[4]/select")
jurisd_options = [x.get_attribute("value") for x in jurisd_options]
petition_options = s.getDropdownOptions("/html/body/form/table[4]/tbody/tr/td[2]/select")
petition_options = [x.get_attribute("value") for x in petition_options]
正如Andersson在下面指出的那样,每当我从“Bench”或“Jurisdiction”中选择一个值时,我都会向服务器发出新的HTTP请求,所以我得到了新的页面。这意味着在选择之前定义的web元素不会附加到DOM,因此我无法再处理它们,但需要重新定义它们。