从地图设计网站下载所有秘密链接

时间:2018-07-24 20:50:54

标签: selenium web-scraping

有一个网站在地图上显示链接(当前无法显示地图层,但可以将链接显示为点)。 要查看此网站,必须遵循以下步骤:(图1-2-3也显示了方式)

首先,点击此网站'http://svtbilgi.dsi.gov.tr/Sorgu.aspx',(Picture 1)

第二,选择'15。 “ Havza”标签中的“ Kizilirmak Havzasi”,Picture 2

最后,单击底部的“ sorgula”。 Picture 3

在最后阶段之后,您应该查看可以在地图上显示这些点的网站('http://svtbilgi.dsi.gov.tr/HaritaNew.aspx')。 Picture 4

通常,我可以使用硒下载网页,也可以使用其他库获取所有链接。但是,这些方法无法获得链接,因为它们几乎是以秘密方式嵌入的。

我想下载这些要点的所有链接。

例如,此脚本不会在“ parent_handle = driver.current_window_handle”行之后继续。我不知道为什么?

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait


driver = webdriver.Firefox(executable_path=r'D:\geckodriver.exe')
driver.get("http://svtbilgi.dsi.gov.tr/Sorgu.aspx")
driver.find_element_by_id("ctl00_hld1_cbHavza").click()
Select(driver.find_element_by_id("ctl00_hld1_cbHavza")).select_by_visible_text("15. Kizilirmak Havzasi")
driver.find_element_by_id("ctl00_hld1_cbHavza").click()
driver.find_element_by_id("ctl00_hld1_btnListele").click()
parent_handle = driver.current_window_handle
all_urls = []
all_images = driver.find_elements_by_xpath("//div[contains(@id,'OL_Icon')]/img")
for image in all_images :
     image.click()
     for handle in driver.window_handles :
          if handle != parent_handle:
              driver.switch_to_window(handle)
              WebDriverWait(driver, 5).until(lambda d: d.execute_script('return document.readyState') == 'complete')
              all_urls.append(driver.current_url)
              driver.close()
              driver.switchTo.window(parent_handle)

1 个答案:

答案 0 :(得分:0)

Why not click them one by one and then get the URL of the opened window, using driver.getCurrentUrl()?

In the below code, first I wait for all the images and then perform the click action using ActionChains class since the normal Selenium click() wasn't working.

Complete code in Python -

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(executable_path=r'D:\Test automation\chromedriver.exe')
driver.get("http://svtbilgi.dsi.gov.tr/Sorgu.aspx")
driver.find_element_by_id("ctl00_hld1_cbHavza").click()
Select(driver.find_element_by_id("ctl00_hld1_cbHavza")).select_by_visible_text("15. Kizilirmak Havzasi")
driver.find_element_by_id("ctl00_hld1_btnListele").click()
parent_handle = driver.current_window_handle
driver.maximize_window()
all_urls = []
all_images = WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH,"//div[contains(@id,'OL_Icon')]/img")))
print len(all_images)
for image in all_images :
     webdriver.ActionChains(driver).move_to_element(image).click(image).perform()
     for handle in driver.window_handles :
          if handle != parent_handle:
              driver.switch_to_window(handle)
              WebDriverWait(driver, 15).until(lambda d: d.execute_script('return document.readyState') == 'complete')
              all_urls.append(driver.current_url)
              driver.close()
              driver.switch_to.window(parent_handle)

print all_urls