以下代码可以一次从一个给定的URL下载一个文件:
from selenium import webdriver
with open("url_lists.txt","r") as fi: ###The text file contains hundreds of urls
urls = fi.read().splitlines()
for url in urls:
browser = webdriver.Firefox()
browser.get(url)
browser.find_element_by_id('download').click()
我想修改代码,以便5个不同的浏览器同时打开5个网址,并一次下载所有5个文件。
我怎样才能完成它?
答案 0 :(得分:3)
您可以使用threading
。
#!/usr/bin/env python
#-*- coding:utf-8 -*-
from selenium import webdriver
from threading import Thread
with open("url_lists.txt","r") as fi: ###The text file contains hundreds of urls
urls = fi.read().splitlines()
def func(url, bro):
browserFunc = getattr(webdriver, bro, webdriver.Firefox)
browser = browserFunc()
browser.get(url)
browser.find_element_by_id('download').click()
t = []
urls = [1,2,3,4,5]
bros = [1,2,3,4,5]
for i in range(len(urls)):
t.append(Thread(target=func, args=[urls[i], bros[i]]))
for i in t:
t.start()
for i in t:
t.join()
if __name__ == '__main__':
a = test1()
答案 1 :(得分:0)
使用可以使用gevent:
from gevent import monkey
monkey.patch_all()
from gevent import spawn, joinall
from selenium import webdriver
def worker(url, worker_number):
browser = webdriver.Firefox()
print 'worker #%s getting "%s"' % (worker_number, url)
browser.get(url)
print 'worker #%s got "%s"' % (worker_number, url)
if __name__ == '__main__':
print 'start'
fh = open('url_lists.txt', 'rb')
joinall([spawn(worker, url.strip(), i) for i, url in enumerate(fh.readlines())])
fh.close()
print 'stop'
此示例将生成与文件中的url一样多的线程(worker)。因此,如果您的文件包含太多网址,最好使用队列或LIMIT数量的工作人员来控制资源并仅下载,例如,一次下载50个网址。