每当我尝试抓取Eastleigh网站时,它都会做它需要做的所有事情。转到URL,自动单击“高级设置”,然后在决策日期中单击搜索,并获取所有href链接,但是当尝试转到它们时,它将失败...为什么?它所需要做的就是单击href链接,但不会;-;
有人可以帮忙解决此问题吗?
代码:
scale_size_area
输出:
import sys
import time
import config
import datetime
from selenium import webdriver
print("1. Custom Date")
print("2. Last Week")
choice = input("#: ")
if choice == "1":
print("Start Example: 1/8/2018")
startDate = input("Start Date: ")
print("Stop Example: 30/8/2018")
stopDate = input("Stop Date: ")
elif choice == "2":
sd = str(datetime.datetime.today().day) # Gets day of the month
sm = str(datetime.datetime.today().month) # Gets month of the year
sy = str(datetime.datetime.today().year) # Gets year
nsd = int(sd) # Turns string variable "sd" into an integer
startDate = "%s/%s/%s" % (nsd-7, sm, sy) # Makes a new date string. Minus 7 off of the original date to go back 1 week
stopDate = "%s/%s/%s" % (nsd-1, sm, sy) # Makes a new date string. Minus 1 off of the original date, (Minusing 1 was Steve's idea, not mine.)
else:
print("This is not a choice.")
print("Press Enter to exit...")
input("")
sys.exit()
url = "https://planning.eastleigh.gov.uk/s/public-register"
driver = webdriver.Chrome(executable_path=r"C:\Users\Goten\Desktop\chromedriver.exe")
driver.get(url)
time.sleep(2)
driver.find_element_by_xpath("(//li[@class='slds-tabs_default__item'])[1]").click()
driver.find_element_by_id("728:0").click() # This changes for some reason... I cannot quite find a way to make it stay the same...
driver.find_element_by_id("728:0").send_keys(startDate)
driver.find_element_by_id("744:0").click() # This also changes
driver.find_element_by_id("744:0").send_keys(stopDate)
driver.find_element_by_xpath("(//button[@name='submit'])[2]").click()
time.sleep(2)
driver.find_element_by_xpath("//*[text()='View More']").click()
result = []
elements = driver.find_elements_by_css_selector(".slds-truncate a")
links = [link.get_attribute("href") for link in elements]
result.extend(links)
print(result)
for link in result:
result.remove(link)
driver.get(link)
for i in range(1):
div = driver.find_element_by_id("slds-form-element__group").text
log = open("log.txt", "a")
log.write(div + "\n")
log.write("\n")
#driver.close()
答案 0 :(得分:1)
实际上,您不能指望这些ID,因为它们是动态生成的,正如您自己指出的那样。 一个黑客解决方案是:
...
inputs = driver.find_elements_by_class_name(" input")
received_from_index = 1
received_to_index = 2
decision_from_index = 3
decision_to_index = 4
received_from = inputs[received_from_index]
received_to = inputs[received_to_index]
received_from.clear()
received_from.send_keys(startDate)
received_to.clear()
received_to.send_keys(stopDate)
这将填写字段(不确定是否需要填写所有字段)。 之后,您的脚本将正确提交并获得结果页面。
您将需要修改代码的这一部分:
for link in result:
result.remove(link)
driver.get(link)
...
第二部分:
...
driver.find_element_by_xpath("//*[text()='View More']").click()
links = driver.find_elements_by_xpath("""//*[@id="arcusbuilt__PApplication__c"]/div/div/h4/a""")
print "Total of planning applications : ", len(links)
for link_index in range(1, len(links) +1):
result_link = driver.find_element_by_xpath("""//*[@id="arcusbuilt__PApplication__c"]/div[%d]/div/h4/a"""%link_index)
result_link.click()
print "visiting link %d"%link_index
WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.NAME, "BackButton")))
time.sleep(3)
#DO WHAT YOU NEED HERE...
back_btn = driver.find_element_by_name("BackButton")
back_btn.click()
祝你好运!