Question

我需要使用python和selenium自动执行一些pdf文档下载。 chrome和firefox的首选项已经完成，可以自动下载，但是我在这个especific网站上遇到了问题，它不会自动下载文档，而是打开了一个新标签页

网站：http://sistemas.sefaz.ma.gov.br/certidoes/jsp/emissaoCertidaoNegativa/emissaoCertidaoNegativa.jsf 选择CPF / CNPJ并使用此编号访问文档22977333000108。

填写表格后，我将在一个新标签页中使用与截图Documment Screenshot相同的网址来获取文档（对不起，必须隐藏文档编号）。如果我手动下载并将其重命名为PDF文件，则可以正常工作。

如何使用硒将其下载为pdf文件？

我的代码如下：

 urlForm = 'https://sistemas.sefaz.ma.gov.br/certidoes/jsp/emissaoCertidaoNegativa/emissaoCertidaoNegativa.jsf'
 #loading firefox webdriver with preferences
 driver = loadDriver( urlForm ) 

 #form filled

 # Clicking submit button
 btnSubmit = driver.find_element_by_id('form1:j_id28')
 btnSubmit.click()
 time.sleep(3)     

 # switch to new tab
 driver.switch_to.window(driver.window_handles[-1])

然后我尝试使用glob这样下载文件：

certidoes = glob.glob('emissaoCertidaoNegativa*')
print(certidoes)

但是它会打印出类似的内容

[emissaoCertidaoNegativa.jsf; jsessionid = 44297D6C88452FC479FC0E94013D3C0A]

当我尝试将其重命名为pdf时，我得到一个空文件...有人可以帮忙吗？

Answer 1

根据documentaion：

返回与路径名匹配的可能为空的路径名列表，其中必须是包含路径说明的字符串。路径名可以是绝对的（例如/usr/src/Python-1.5/Makefile）或相对的（例如 ../../Tools//.gif），并且可以包含shell样式的通配符。破碎符号链接包含在结果中（如在外壳程序中一样）。

，因此您必须指定文件的路径。

示例：

让我们说我们在两个不同的目录中有两个output.txt文件：

C:\\Test\\output.txt
C:\\Users\\userName\\Desktop\\output.txt

然后尝试以下代码：

print(glob.glob('C:\\Test\\output.txt', recursive=True))
print(glob.glob('C:\\**\\output.txt', recursive=True))
print(glob.glob('C:\\**\\output.txt'))
print(glob.glob('output.txt'))

输出将是：

['C:\\Test\\output.txt']
['C:\\test\\output.txt', 'C:\\Users\\userName\\Desktop\\output.txt']
['C:\\test\\output.txt']
[]

因此，您必须指定文件保存目录的路径。您可以使用**，但正如文档所述：

注意在大型目录树中使用“ **”模式可能会消耗时间过长。

如何从JSF页面自动下载PDF文件

1 个答案: