我正在使用selenium脚本,我正在尝试下载Excel文件并为其指定一个特定名称。这是我的代码:
无论如何,我可以为下载的文件指定一个特定名称吗?
代码:
#!/usr/bin/python
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile()
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
profile.set_preference("browser.download.dir", "C:\\Downloads" )
browser = webdriver.Firefox(firefox_profile=profile)
browser.get('https://test.com/')
browser.find_element_by_partial_link_text("Excel").click() # Download file
答案 0 :(得分:8)
您无法通过selenium指定下载文件的名称。但是,您可以下载该文件,在下载的文件夹中找到最新文件,然后根据需要重命名。
注意:谷歌搜索中借用的方法可能有错误。但是你明白了。
import os
import shutil
filename = max([f for f in os.listdir('c:\downloads')], key=os.path.getctime)
shutil.move(os.path.join(dirpath,filename),newfilename)
答案 1 :(得分:3)
这是另一个简单的解决方案,您可以等待下载完成,然后从chrome下载中获取下载的文件名。
Chrome浏览器:
# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time()+waitTime
while True:
try:
# get downloaded percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# return the file name once the download is completed
return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Firefox:
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
WebDriverWait(driver,10).until(EC.new_window_is_opened)
driver.switch_to.window(driver.window_handles[-1])
driver.get("about:downloads")
endTime = time.time()+waitTime
while True:
try:
fileName = driver.execute_script("return document.querySelector('#contentAreaDownloadsView .downloadMainArea .downloadContainer description:nth-of-type(1)').value")
if fileName:
return fileName
except:
pass
time.sleep(1)
if time.time() > endTime:
break
点击下载链接/按钮后,只需调用上述方法即可。
# click on download link
browser.find_element_by_partial_link_text("Excel").click()
# get the downloaded file name
latestDownloadedFileName = getDownLoadedFileName(180) #waiting 3 minutes to complete the download
print(latestDownloadedFileName)
JAVA + Chrome:
这是Java中的方法。
public String waitUntilDonwloadCompleted(WebDriver driver) throws InterruptedException {
// Store the current window handle
String mainWindow = driver.getWindowHandle();
// open a new tab
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("window.open()");
// switch to new tab
// Switch to new window opened
for(String winHandle : driver.getWindowHandles()){
driver.switchTo().window(winHandle);
}
// navigate to chrome downloads
driver.get("chrome://downloads");
JavascriptExecutor js1 = (JavascriptExecutor)driver;
// wait until the file is downloaded
Long percentage = (long) 0;
while ( percentage!= 100) {
try {
percentage = (Long) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value");
//System.out.println(percentage);
}catch (Exception e) {
// Nothing to do just wait
}
Thread.sleep(1000);
}
// get the latest downloaded file name
String fileName = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text");
// get the latest downloaded file url
String sourceURL = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').href");
// file downloaded location
String donwloadedAt = (String) js1.executeScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div.is-active.focus-row-active #file-icon-wrapper img').src");
System.out.println("Download deatils");
System.out.println("File Name :-" + fileName);
System.out.println("Donwloaded path :- " + donwloadedAt);
System.out.println("Downloaded from url :- " + sourceURL);
// print the details
System.out.println(fileName);
System.out.println(sourceURL);
// close the downloads tab2
driver.close();
// switch back to main window
driver.switchTo().window(mainWindow);
return fileName;
}
这是在Java脚本中调用它的方法。
// download triggering step
downloadExe.click();
// now waituntil download finish and then get file name
System.out.println(waitUntilDonwloadCompleted(driver));
输出:
下载详细信息
文件名:-RubyMine-2019.1.2(7).exe
下载的路径:-chrome://fileicon/C%3A%5CUsers%5Csupputuri%5CDownloads%5CRubyMine-2019.1.2%20(7).exe?scale = 1.25x
从网址下载:-https://download-cf.jetbrains.com/ruby/RubyMine-2019.1.2.exe
RubyMine-2019.1.2(7).exe
答案 2 :(得分:2)
@parishodak回答:
这里的文件名只返回相对路径(这里是文件名)而不是绝对路径。
这就是为什么@FreshRamen在以下错误后得到以下错误:
class CMyClass(QtGui.QMainWindow):
''''''''''''''''''''''''''''''''''''''''''
def __init__(self):
global state
while True:
self.tickMain()
time.sleep(1)
''''''''''''''''''''''''''''''''''''''''''
def tickMain(self):
global state
print("Ticking...")
switcher = { States.STATE_INIT: self.state_Init,
States.STATE_IDLE: self.state_Idle,
States.STATE_PROCESS_MSG_QUEUE: self.state_ProcessMsgQueue }
func = switcher.get(state, lambda: "nothing")
func()
''''''''''''''''''''''''''''''''''''''''''
def state_Init(self):
global state
print("Initializing...")
super(CMyClass, self).__init__()
print("Setting up COM port...")
self.com_serial = SerialCommsHandler.SerialCommsHandler()
print("Initializing UI...")
self.initUI()
print("Reading from COM port...")
state = States.STATE_IDLE
''''''''''''''''''''''''''''''''''''''''''
def initUI(self):
self.initMenuBar()
# Status Bar
sb_statusBar = self.statusBar()
# Send PushButton
self.btn_send = QtGui.QPushButton('Send', self)
self.btn_send.move(400, 245)
self.connect(self.btn_send, Qt.SIGNAL("clicked()"), self.slotMsgSend)
self.setGeometry(300, 300, 500, 400)
self.setFixedSize(650, 500)
self.setWindowTitle('My Test App')
self.show()
有正确的代码:
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/genericpath.py",
line 72, in getctime return os.stat(filename).st_ctime OSError:
[Errno 2] No such file or directory: '.localized'
答案 3 :(得分:2)
希望这段代码不会令人困惑。我花了一段时间来创建它并且非常有用,因为只有这个库才能解决这个问题。
import os
import time
def tiny_file_rename(newname, folder_of_download):
filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa : os.path.getctime(os.path.join(folder_of_download,xa)))
if '.part' in filename:
time.sleep(1)
os.rename(os.path.join(folder_of_download, filename), os.path.join(folder_of_download, newname))
else:
os.rename(os.path.join(folder_of_download, filename),os.path.join(folder_of_download,newname))
希望这能节省一天的时间,欢呼。
编辑:感谢@Om Prakash编辑我的代码,它让我记住我并没有解释代码。使用max([])
函数可能会导致竞争条件,让您留下空或损坏的文件(我从经验中知道)。您想先检查文件是否已完全下载。这是因为selenium没有等待文件下载完成,因此当您检查最后创建的文件时,生成的列表中会显示一个不完整的文件,它会尝试移动该文件。即使这样,你最好稍等一下,让文件免于Firefox。
编辑2:更多代码
我被问到1秒钟是否足够时间,但是如果您需要等待更多时间,您可以将上述代码更改为:
import os
import time
def tiny_file_rename(newname, folder_of_download, time_to_wait=60):
time_counter = 0
filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa : os.path.getctime(os.path.join(folder_of_download,xa)))
while '.part' in filename:
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:
raise Exception('Waited too long for file to download')
filename = max([f for f in os.listdir(folder_of_download)], key=lambda xa : os.path.getctime(os.path.join(folder_of_download,xa)))
os.rename(os.path.join(folder_of_download, filename), os.path.join(folder_of_download, newname))
答案 4 :(得分:2)
我想出了一个不同的解决方案。由于您只关心最近下载的文件,所以为什么不将其下载到dummy_dir
中呢?因此,该文件将成为该目录中的唯一文件。下载后,您可以将其移至destination_dir
并更改其名称。
以下是与 Firefox 一起使用的示例:
def rename_last_downloaded_file(dummy_dir, destination_dir, new_file_name):
def get_last_downloaded_file_path(dummy_dir):
""" Return the last modified -in this case last downloaded- file path.
This function is going to loop as long as the directory is empty.
"""
while not os.listdir(dummy_dir):
time.sleep(1)
return max([os.path.join(dummy_dir, f) for f in os.listdir(dummy_dir)], key=os.path.getctime)
while '.part' in get_last_downloaded_file_path(dummy_dir):
time.sleep(1)
shutil.move(get_last_downloaded_file_path(dummy_dir), os.path.join(destination_dir, new_file_name))
您可以随意调整sleep
的时间,并添加TimeoutException
。
答案 5 :(得分:1)
这是我用来下载具有特定文件名的pdf的代码示例。首先,您需要使用必需的选项配置chrome webdriver。然后,单击按钮(打开pdf弹出窗口)后,调用一个函数,等待下载完成并重命名下载的文件。
import os
import time
import shutil
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
# function to wait for download to finish and then rename the latest downloaded file
def wait_for_download_and_rename(newFilename):
# function to wait for all chrome downloads to finish
def chrome_downloads(drv):
if not "chrome://downloads" in drv.current_url: # if 'chrome downloads' is not current tab
drv.execute_script("window.open('');") # open a new tab
drv.switch_to.window(driver.window_handles[1]) # switch to the new tab
drv.get("chrome://downloads/") # navigate to chrome downloads
return drv.execute_script("""
return document.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloadsList')
.items.filter(e => e.state === 'COMPLETE')
.map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
""")
# wait for all the downloads to be completed
dld_file_paths = WebDriverWait(driver, 120, 1).until(chrome_downloads) # returns list of downloaded file paths
# Close the current tab (chrome downloads)
if "chrome://downloads" in driver.current_url:
driver.close()
# Switch back to original tab
driver.switch_to.window(driver.window_handles[0])
# get latest downloaded file name and path
dlFilename = dld_file_paths[0] # latest downloaded file from the list
# wait till downloaded file appears in download directory
time_to_wait = 20 # adjust timeout as per your needs
time_counter = 0
while not os.path.isfile(dlFilename):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:
break
# rename the downloaded file
shutil.move(dlFilename, os.path.join(download_dir,newFilename))
return
# specify custom download directory
download_dir = r'c:\Downloads\pdf_reports'
# for configuring chrome pdf viewer for downloading pdf popup reports
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', {
"download.default_directory": download_dir, # Set own Download path
"download.prompt_for_download": False, # Do not ask for download at runtime
"download.directory_upgrade": True, # Also needed to suppress download prompt
"plugins.plugins_disabled": ["Chrome PDF Viewer"], # Disable this plugin
"plugins.always_open_pdf_externally": True, # Enable this plugin
})
# get webdriver with options for configuring chrome pdf viewer
driver = webdriver.Chrome(options = chrome_options)
# open desired webpage
driver.get('https://mywebsite.com/mywebpage')
# click the button to open pdf popup
driver.find_element_by_id('someid').click()
# call the function to wait for download to finish and rename the downloaded file
wait_for_download_and_rename('My file.pdf')
# close the browser windows
driver.quit()
根据需要将超时(120)设置为等待时间。
答案 6 :(得分:1)
我正在使用以下功能。 它会检查您为 chrome/selenium 指定的下载位置中的文件,并且只有在 10 秒前(max_old_time)创建了一个 maxium 文件,它会重命名它。否则,它最多等待 60 秒(max_waiting_time)。
不确定是否是最好的方法,但它对我有用..
import os, shutil, time
from datetime import datetime
def rename_last_file(download_folder,destination_folder,newfilename):
#Will wait for maxium max_waiting_time seconds for a new in folder.
max_waiting_time=60
#Will rename only is the file creation has less than max_old_stime seconds.
max_old_time=10
start_time=datetime.now().timestamp()
while True:
filelist=[]
last_file_time=0
for current_file in os.listdir(download_folder):
filelist.append(current_file)
current_file_fullpath=os.path.join(download_folder, current_file)
current_file_time=os.path.getctime(current_file_fullpath)
if os.path.isfile(current_file_fullpath):
if last_file_time==0:
last_file=current_file
last_file_time=os.path.getctime(os.path.join(download_folder, last_file))
if current_file_time>last_file_time and os.path.isfile(current_file_fullpath):
last_file=current_file
last_file_fullpath=os.path.join(download_folder, last_file)
if start_time-last_file_time<max_old_time:
shutil.move(last_file_fullpath,os.path.join(destination_folder,newfilename))
print(last_file_fullpath)
return(0)
elif (datetime.now().timestamp()-start_time)>max_waiting_time:
print("exit")
return(1)
else:
print("waiting file...")
time.sleep(5)
答案 7 :(得分:0)
使用@dmb的把戏。 Ive刚刚进行了一项更正:在.part
控制之下,在time.sleep(1)
下,我们必须再次请求文件名。否则,下面的行将尝试重命名.part
文件,该文件不再存在。
答案 8 :(得分:-1)
您可以使用urlretrieve
import urllib
url = browser.find_element_by_partial_link_text("Excel").get_attribute('href')
urllib.urlretrieve(url, "/choose/your/file_name.xlsx")