您好我有一个名为image.txt的文件,其中包含大约5,00,000个图片网址。我想读取网址并下载图片并将其保存在目录中。如果图片无法下载,我想打印例外并继续为其他人下载。我可以以优化的方式实现这一目标。
import sys
import os
import urllib
def isValidFile(path):
if not os.path.isfile(path):
print "Path " + path + " doesn't exist! Aborting..."
exit(1)
def isValidDir(path):
if not os.path.isdir(path):
print "Path " + path + " doesn't exist! Aborting..."
exit(1)
def normalize(url):
url = url.split("/")[-1]
return url.split("\n")[0]
# Execution Starts Here
urls = sys.argv[1]
isValidFile(urls)
out_dir = sys.argv[2]
isValidDir(out_dir)
with open(urls) as url_array:
for url in url_array:
urllib.urlretrieve(url, os.path.join(out_dir, normalize(url)))
print("Images Downloaded")
答案 0 :(得分:0)
如果你想要一个纯python解决方案,你可以试试这个:
import urllib
import os
def getImage(url, dest):
with open(dest, 'wb') as fh:
fh.write(urllib.urlopen(url).read())
for url in urlArray:
try:
getImage(url, os.path.basename(url))
except Exception:
print "Error downloading {}".format(url)