Selenium Chrome另存为pdf更改下载文件夹

时间:2019-02-07 17:20:41

标签: python selenium web-scraping

我想将网站下载为pdf文件,它可以正常工作,但应该将文件下载到特定路径,而只是将文件下载到我的默认下载目录中。

import json
from selenium import webdriver

appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2,
    'download.default_directory': 'C:\\Users\\Oli\\Google Drive',
    "download.directory_upgrade": True
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.google.com/')
driver.execute_script('window.print();')

有人通过某种方式来保护具有特定名称的文件吗?

3 个答案:

答案 0 :(得分:2)

download.default_directory不能添加到appState的{​​{1}}中,而不能添加到"prefs"

喜欢:

add_experimental_option

但是对于您而言,这无济于事,因为此选项将位置设置为“文件->另存为”,并且您需要“打印->另存为”

作为解决方法,您可以为Chrome使用chrome_options.add_experimental_option("prefs", { 'download.default_directory': 'C:\\Users\\Oli\\Google Drive', 'download.directory_upgrade': True }) 参数(无需运行Chrome Webdriver,但Chrome本身可以采用无头模式)

--print-to-pdf

要小心,因为它以静默方式运行,如果未创建文件,则不会发出警告消息(例如,如果没有这样的目录,或者没有C:\ Users的管理员权限,或者没有这样的网页)。

您总是可以像这样在命令行(cmd)中进行测试:

import os

path_to_file = 'C:\\Users\\Oli\\Google Drive\\'
name_of_file = '1.pdf'
page_to_open = 'http://example.com'

command_to_run = 'start chrome --headless --print-to-pdf="{0}{1}" {2}'.format(path_to_file, name_of_file, page_to_open)
print('launch:'+command_to_run)

os.popen(command_to_run)

答案 1 :(得分:1)

download.default_directory设置仅适用于下载的内容。 Chrome对保存在页面上的文件的处理方式有所不同。要更改页面打印输出的默认文件夹,只需设置savefile.default_directory值即可。

因此,可以将完整示例打印为pdf以自定义位置:

import json
from selenium import webdriver

appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local",
            "account": ""
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState),
           'savefile.default_directory': 'path/to/dir/'}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')

driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
driver.execute_script('window.print();')

答案 2 :(得分:0)

另一个解决方法。只需按原样保存文件,然后根据需要移动并重命名即可。

以下代码的含义:检查下载目录中每个(pdf)文件的创建时间,并与现在的时间进行比较。如果时间差小于某个值(例如15秒)(大概是正确的文件),请在需要的位置移动/重命名该文件。

import os
import time
import json
from selenium import webdriver

appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}

download_path = r'C:\Users\Oli\Downloads' # Path where browser save files
new_path = r'C:\Users\Oli\Google Drive' # Path where to move file

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get('http://example.com/')
driver.execute_script('window.print();')

new_filename = 'new_name.pdf' # Set the name of file
timestamp_now = time.time() # time now
# Now go through the files in download directory
for (dirpath, dirnames, filenames) in os.walk(download_path):
    for filename in filenames:
        if filename.lower().endswith(('.pdf')):
            full_path = os.path.join(download_path, filename)
            timestamp_file = os.path.getmtime(full_path) # time of file creation
            # if time delta is less than 15 seconds move this file
            if (timestamp_now - timestamp_file) < 15: 
                full_new_path = os.path.join(new_path, new_filename)
                os.rename(full_path, full_new_path)
                print(full_path+' is moved to '+full_new_path)

注意:这只是一个例子。您需要考虑所有行动。为了使代码稳定,您可能需要添加一些异常处理。最好将此附加代码移至函数。依此类推。