下载几个线程

时间:2017-03-19 20:35:27

标签: python google-chrome selenium

由于我位于中国,使用YouTube对我来说很头疼。我尝试建立一个播放列表下载器(我知道youtube-dl但想学习新的),现在它基本上完成了它的工作。 由于连接不稳定(中国的VPN),如果出现连接问题,浏览器会停止下载,但不会将控制权返回给我的脚本。我该如何处理这个异常? 整个文件都在这里 github link 这是代码(从3d派对服务下载):

def checkFileStatus(file_name):
"""Helper function for getDownloads, waits file to be downloaded"""
print('Ok, writing exactly ' + file_name)
checkFlag = False
while not checkFlag:
    if os.path.exists(file_name):
        print(file_name + " exists")
        check_one = os.path.getsize(file_name)
        time.sleep(5)
        check_two = os.path.getsize(file_name)
        if check_two == check_one:
            checkFlag = True
            print(file_name + ' has been saved')
            return
    else:
        time.sleep(5)


def getDownloads(clip_links, home_dir):
    """Manages the whole downloading process
    - opens webdrive with Chrome
    - saves and renames files from valid links
    -  quits webdrive"""
    chromeOptions = webdriver.ChromeOptions()
    prefs = {"download.default_directory": home_dir}
    chromeOptions.add_experimental_option("prefs", prefs)
    chromedriver = 'C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe'
    savemediaurl = 'http://savemedia.com/'
    for index, entry in enumerate(clip_links):
        driver = webdriver.Chrome(executable_path=chromedriver, chrome_options=chromeOptions)
        saved_file = downloadFromSavemedia(savemediaurl, driver, entry[1], index+1)
        if saved_file:
            old_name = entry[0] + '.mp4'
            checkFileStatus(old_name)
            new_name = str(index+1).zfill(2) + '. ' + entry[0] + '.mp4'
            os.rename(old_name, new_name)
        driver.quit()
    return

1 个答案:

答案 0 :(得分:0)

如果我理解你的问题,你的下载完全停止而不会将控制权交还给你的脚本,你需要一种方法来检测/处理这种情况吗?

您可以尝试混合timeout-decoratorretry包:

from retry import retry
from timeout_decorator import timeout, TimeoutError

# <...>

@retry(TimeoutError, tries=3)
@timeout(300)  # Wait 300 seconds before raising the timeout exception
def _download(chromedriver, savemediaurl, entry, index):
    driver = webdriver.Chrome(executable_path=chromedriver, chrome_options=chromeOptions)
    try:
        saved_file = downloadFromSavemedia(savemediaurl, driver, entry[1], index+1)
        if saved_file:
            old_name = entry[0] + '.mp4'
            checkFileStatus(old_name)
            new_name = str(index+1).zfill(2) + '. ' + entry[0] + '.mp4'
            os.rename(old_name, new_name)
    finally:
        driver.quit()

def getDownloads(clip_links, home_dir):
    """Manages the whole downloading process
    - opens webdrive with Chrome
    - saves and renames files from valid links
    -  quits webdrive"""
    chromeOptions = webdriver.ChromeOptions()
    prefs = {"download.default_directory": home_dir}
    chromeOptions.add_experimental_option("prefs", prefs)
    chromedriver = 'C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe'
    savemediaurl = 'http://savemedia.com/'
    for index, entry in enumerate(clip_links):
        _download(chromedriver, savemediaurl, entry, index)

    return

在上面,如果整个函数调用花费的时间超过5分钟,则超时装饰器会抛出一个TimeoutError异常,然后由重试装饰器捕获并处理。这将重复最多3次,之后重试装饰器将重新引发TimeoutError。

有两点需要注意:

首先,您需要调整超时值。可能仍然会发生下载,但是花费的时间比预期的要长,超时装饰器仍然可以杀死它。

其次,我添加了一个try / finally块,以确保在抛出超时时正常关闭驱动程序。如果没有这个,你将开始收集僵尸铬流程,这会使你的系统陷入困境。

(这都是未经测试的)