我想将网站下载为pdf文件,它可以正常工作,但应该将文件下载到特定路径,而只是将文件下载到我的默认下载目录中。
import json
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"selectedDestinationId": "Save as PDF",
"version": 2,
'download.default_directory': 'C:\\Users\\Oli\\Google Drive',
"download.directory_upgrade": True
}
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.google.com/')
driver.execute_script('window.print();')
有人通过某种方式来保护具有特定名称的文件吗?
答案 0 :(得分:2)
download.default_directory
不能添加到appState
的{{1}}中,而不能添加到"prefs"
喜欢:
add_experimental_option
但是对于您而言,这无济于事,因为此选项将位置设置为“文件->另存为”,并且您需要“打印->另存为”
作为解决方法,您可以为Chrome使用chrome_options.add_experimental_option("prefs", {
'download.default_directory': 'C:\\Users\\Oli\\Google Drive',
'download.directory_upgrade': True
})
参数(无需运行Chrome Webdriver,但Chrome本身可以采用无头模式)
--print-to-pdf
要小心,因为它以静默方式运行,如果未创建文件,则不会发出警告消息(例如,如果没有这样的目录,或者没有C:\ Users的管理员权限,或者没有这样的网页)。
您总是可以像这样在命令行(cmd)中进行测试:
import os
path_to_file = 'C:\\Users\\Oli\\Google Drive\\'
name_of_file = '1.pdf'
page_to_open = 'http://example.com'
command_to_run = 'start chrome --headless --print-to-pdf="{0}{1}" {2}'.format(path_to_file, name_of_file, page_to_open)
print('launch:'+command_to_run)
os.popen(command_to_run)
答案 1 :(得分:1)
download.default_directory
设置仅适用于下载的内容。 Chrome对保存在页面上的文件的处理方式有所不同。要更改页面打印输出的默认文件夹,只需设置savefile.default_directory
值即可。
因此,可以将完整示例打印为pdf以自定义位置:
import json
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState),
'savefile.default_directory': 'path/to/dir/'}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
driver.execute_script('window.print();')
答案 2 :(得分:0)
另一个解决方法。只需按原样保存文件,然后根据需要移动并重命名即可。
以下代码的含义:检查下载目录中每个(pdf)文件的创建时间,并与现在的时间进行比较。如果时间差小于某个值(例如15秒)(大概是正确的文件),请在需要的位置移动/重命名该文件。
import os
import time
import json
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
download_path = r'C:\Users\Oli\Downloads' # Path where browser save files
new_path = r'C:\Users\Oli\Google Drive' # Path where to move file
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('http://example.com/')
driver.execute_script('window.print();')
new_filename = 'new_name.pdf' # Set the name of file
timestamp_now = time.time() # time now
# Now go through the files in download directory
for (dirpath, dirnames, filenames) in os.walk(download_path):
for filename in filenames:
if filename.lower().endswith(('.pdf')):
full_path = os.path.join(download_path, filename)
timestamp_file = os.path.getmtime(full_path) # time of file creation
# if time delta is less than 15 seconds move this file
if (timestamp_now - timestamp_file) < 15:
full_new_path = os.path.join(new_path, new_filename)
os.rename(full_path, full_new_path)
print(full_path+' is moved to '+full_new_path)
注意:这只是一个例子。您需要考虑所有行动。为了使代码稳定,您可能需要添加一些异常处理。最好将此附加代码移至函数。依此类推。