我无法从使用python 3.6.0 + selenium 3.4.3的javascript函数生成的链接下载PDF文件

时间:2017-09-11 23:46:39

标签: javascript python selenium download screen-scraping

网址为:site

通过使用Firefox 47.0.2二进制文件和python 3.6.0的selenium,从这个页面我点击“Pesquisar”按钮,在下一页我填写了日期范围的格式(格式为d / m / y)然后再次点击新的“Pesquisar”按钮,然后我得到一份PDF文档列表,我想下载它们。

当我打印page_source时,我可以看到生成的链接,但我不明白为什么selenium找不到这些链接。

简化代码如下:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from datetime import datetime, date, timedelta
from calendar import monthrange
import time


driver = webdriver.Firefox(firefox_profile=profile, firefox_binary=binary, capabilities=capabilities)
driver.maximize_window()
wait = WebDriverWait(driver, 10)

months = range(1, 13)
limits = monthrange(2017, 8)

#num_docs = limites[1]-limites[0]

date_input_begin = '{num:0{width}}'.format(num=limits[0], width=2) + '08' + '2017'
date_input_end = '{num:0{width}}'.format(num=limits[1], width=2) + '08' + '2017'

today = datetime.now().date()
date = today

date = date - timedelta(24)

driver.get("http://dje.trf2.jus.br/DJE/Paginas/Externas/inicial.aspx")

driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrInicial_btnPesquisar").click()

wait.until(EC.presence_of_element_located(
    (By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar"]')))

select1 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlAreaJudicial"))
select1.select_by_index(3)

select2 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlRegistrosPaginas"))
select2.select_by_index(6)

element_date_begin = driver.find_element_by_id(
    'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataInicial')
element_date_begin.clear()
element_date_begin.send_keys(date_input_begin)

element_date_end = driver.find_element_by_id(
    'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataFinal')
element_date_end.clear()
element_date_end.send_keys(date_input_end)

driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').submit()

wait.until(EC.presence_of_element_located((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar')))
wait.until(EC.element_to_be_clickable((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar')))

time.sleep(5)
driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').click()

wait.until(EC.presence_of_element_located(
    (By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_lblNomeCaderno"]')))

driver.find_element_by_xpath(
    '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData"]').click()

但是当我通过ID或XPATH查找链接时,我收到以下错误:

  

文件" C:\ Users \ b2002032064079 \ Anaconda3 \ lib \ site-packages \ selenium \ webdriver \ remote \ errorhandler.py",第194行,在check_response中       提出exception_class(消息,屏幕,堆栈跟踪)   selenium.common.exceptions.NoSuchElementException:消息:无法找到元素:{"方法":" xpath","选择器":" // * [@id = \" ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData \"]"}

我是一个刮痧的新手,我非常感谢任何帮助!谢谢!

1 个答案:

答案 0 :(得分:1)

首先:您使用的是哪种浏览器? 2:您的网站速度很慢。也许尝试给予更多的等待时间。 3:xpath是否正确?我认为问题出在XPATH上 尝试在chrome上使用XPath helper进行检查。