我想为每个组织在Python循环中下载2017年名为“ sprawozdanie merytoryczne”的文件。要手动下载一个文件,您必须转到网站:http://sprawozdaniaopp.mpips.gov.pl/单击按钮“Znajdź”,然后单击组织名称-模式框将显示该特定组织的“ sprawozdanie merytoryczne”链接。我想为所有组织自动执行此操作。但是我遇到了一些问题。在第一次运行循环期间,一切正常,下载了第一个文件。但是当涉及到第二个时,它会打开一个模态窗口,但是尽管存在它也没有看到“ sprawozdanie merytoryczne”。我认为切换到Windows是有问题的。我将非常感谢您的帮助。这是我的代码:
import urllib
import urllib.request
import requests
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
import re
import unicodecsv # import whole module
import requests # import whole module
from bs4 import BeautifulSoup # import only things that we need
import time
import smtplib
from selenium import webdriver
chrome_path= r"C:\Users\username\AppData\Local\Programs\Python\Python35-
32\Scripts\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("http://sprawozdaniaopp.mpips.gov.pl/")
rok = driver.find_element_by_xpath("//*[@id='instanceYear']")
rok.send_keys('2017')
wojewodztwo = driver.find_element_by_xpath("//*[@id='Province']")
wojewodztwo.clear()
wojewodztwo.send_keys('MAZOWIECKIE')
elem = driver.find_element_by_xpath("//*[@id='btnsearch']/span")
elem.click()
for i in range(1, 1348):
winhandle = driver.current_window_handle
p1 = r'#form1 > div > div.grid > table > tbody > tr:nth-child('
p2 = ') > td:nth-child(3) > a'
p3 = p1 + str(i) + p2
elem1 = driver.find_element_by_css_selector(p3)
p1 = r'#form1 > div > div.grid > table > tbody > tr:nth-child('
p2 = ') > td:nth-child(5)'
p3 = p1 + str(i) + p2
miejscowosc = driver.find_element_by_css_selector(p3)
print(miejscowosc.text) #miejscowosc means city
miejscowosc1=miejscowosc.text
p1 = r'#form1 > div > div.grid > table > tbody > tr:nth-child('
p2 = ') > td:nth-child(4)'
p3 = p1 + str(i) + p2
wojewodztwo = driver.find_element_by_css_selector(p3)
elem1.click()
WebDriverWait(driver,
10).until(EC.presence_of_element_located((By.CSS_SELECTOR,".ui-
dialog.ui-widget.ui-widget-content.ui-corner-all")))
try:
elem2 = driver.find_element_by_link_text("Sprawozdanie
merytoryczne").click()
organizationName = driver.find_elements_by_class_name("td1")
orgname = str(organizationName[11].text)
orgname1 = orgname.replace('"', "")
print(organizationName[11].text)
driver.switch_to.window(driver.window_handles[1])
urltemp = driver.current_url
urltodownload= requests.get(urltemp)
path1 = r'C:/Users/adunajsk/Desktop/pdf17maz/'
path2 = '.pdf'
path3 = path1 + orgname1 + path2
print(path3)
with open(path3, 'wb') as f:
f.write(urltodownload.content)
driver.close()
del organizationName[:]
except NoSuchElementException:
print("Plik nie istnieje")
driver.switch_to.window(winhandle)
WebDriverWait(driver,
8).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "body
> div.ui-dialog.ui-widget.ui-widget-content.ui-corner-all >
div.ui-dialog-titlebar.ui-widget-header.ui-corner-all.ui-helper-
clearfix > a > span")))
closebutton= driver.find_element_by_css_selector("body > div.ui-
dialog.ui-widget.ui-widget-content.ui-corner-all > div.ui-dialog-
titlebar.ui-widget-header.ui-corner-all.ui-helper-clearfix > a")
closebutton.click()
答案 0 :(得分:0)
问题是,一旦打开模式对话框,即使将其关闭,它也将保留在DOM中。当您打开第二个定位器时,找到第一个定位器,然后尝试单击此处。 您也可以配置驱动程序以直接下载pdf,而无需打开它。
此处代码:
否:我使用Java进行了编码和测试,代码可能包含语法错误
#set chrome options to download pdf instead open it in browser, this will remove need to handle windows and make it much faster
options = webdriver.ChromeOptions()
downloadPath = r'C:\Users\username\Downloads'
profile = {"plugins.plugins_list": [{"enabled":False,"name":"Chrome PDF Viewer"}],"download.default_directory" : downloadPath}
options.add_experimental_option("prefs",profile)
driver = webdriver.Chrome(r"C:\Users\username\AppData\Local\Programs\Python\Python35-32\Scripts\chromedriver.exe", chrome_options=options)
driver.get("http://sprawozdaniaopp.mpips.gov.pl/")
WebDriverWait(driver, 10).until(EC.visibility_of_element_located(By.ID, 'Province')).send_keys('MAZOWIECKIE')
driver.find_element_by_id('instanceYear').send_keys('2017')
driver.find_element_by_id('btnsearch').click()
#after search wait table to load data with column with MAZOWIECKIE text
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//table[@class="webgrid"]/tbody//td[normalize-space(.)="MAZOWIECKIE"]')))
#get all rows and iterate throw, make your code dinamically and not depends row size
rows = driver.find_elements_by_css_selector('table.webgrid tbody tr');
for row in rows:
#get KRS column number
krs = row.find_element_by_css_selector('td:nth-child(2)').text()
#click to link in Nazwa column
row.find_element_by_css_selector('td:nth-child(3) a').click()
#find modal box DIV element with KRS numeber got from click row. as option you can get all modal boxes and get one visible.
modalBoxLocator = "(//table[@id='tbldetails']//td[contains(.,'" + krs + "')]/ancestor::div[contains(@class,'ui-dialog')][2])[last()]"
modalBox = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, modalBoxLocator)))
#find TD with 2017 text and then click on first "Sprawozdanie merytoryczne" link after 2017
modalBox.find_element_by_xpath('.//tr[./td[.='2017']]/following-sibling::tr[.//a[.="Sprawozdanie merytoryczne"]][1]//a').click()
#close modal box
modalBox.find_element_by_css_selector('a.ui-dialog-titlebar-close').click()
#if modalBox.find_elements_by_css_selector('a.ui-dialog-titlebar-close').size()>0:
# modalBox.find_element_by_css_selector('a.ui-dialog-titlebar-close').click()