Question

我正在尝试下载具有以下href的pdf文件（我更改了一些值，因为pdf包含个人信息）

https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d

当我在浏览器中跳过该href时，会直接下载pdf文件，但是当我尝试在python代码中使用请求时，它只会下载

的源代码

https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/

这是我的代码，我使用硒在网站中找到href

fact = driver.find_element_by_xpath(url)
href = fact.get_attribute('href')
print(href)      // href is correct here
reply = get(href, Stream=True)
print(reply)     // I got the source code

这是硒所找到的html

<a href="grandcompte/factures/consulter-votre-factue/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d"></a>

希望您能提供足够的信息，谢谢

Answer 1

无法使用您的链接，因为它需要验证，因此找到了重定向pdf下载的另一个示例。将Chrome设置为下载pdf，而不是显示从this StackOverflow answer提取的pdf。

import selenium.webdriver

url = "https://readthedocs.org/projects/selenium-python/downloads/pdf/latest/"

download_dir = 'C:/Dev'
profile = {
    "plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
    "download.default_directory": download_dir ,
    "download.extensions_to_open": "applications/pdf"
}

options = selenium.webdriver.ChromeOptions()
options.add_experimental_option("prefs", profile)
driver = selenium.webdriver.Chrome(options=options)

driver.get(url)

通过查看文档，driver.get方法不会返回任何内容，而只是告诉Webdriver导航到页面。如果要在将PDF保存到文件之前使用Python处理pdf，则可以考虑使用Requests或Robobrowser。

Stream=True选项不适用于webdriver.Chrome，因此不确定这是否是您正在使用的方法，但是上面的方法应该可以满足您的要求。

Python下载href，获取源代码而不是pdf文件

1 个答案: