我有一个脚本可以从链接下载图像。假设脚本由于某种原因而终止,那么我想保存下载图像的位置,然后从上次保存的位置重新开始
到目前为止,我已经制作了下载脚本并尝试使用pickle保存程序的状态
import pandas as pd
import requests as rq
import os,time,random,pickle
import csv
data=pd.read_csv("consensus_data.csv",usecols=["CaptureEventID","Species"])
z=data.loc[ data.Species.isin(['buffalo']), :]
df1=pd.DataFrame(z)
data_2=pd.read_csv("all_images.csv")
df2=pd.DataFrame(data_2)
df3=pd.merge(df1,df2,on='CaptureEventID')
p=df3.to_csv('animal_img_list.csv',index=False)
# you need to change the location below
data_final = pd.read_csv("animal_img_list.csv")
output=("/home/avnika/data_serengeti/url_op")
mylist = []
for i in range(0,100):
x = random.randint(1,10)
mylist.append(x)
print(mylist)
for y in range(len(mylist)):
d=mylist[y]
print(d)
file_name = data_final.URL_Info
print(len(file_name))
for file in file_name:
image_url='https://snapshotserengeti.s3.msi.umn.edu/'+file
f_name=os.path.split(image_url)[-1]
print(f_name)
r=rq.get(image_url)
with open(output+"/"+f_name, 'wb') as f:
f.write(r.content)
time.sleep(d)
with open("/home/avnika/data_serengeti","wb") as fp:
pickle.dump(r,fp)
with open("/home/avnika/data_serengeti","rb") as fp:
pic_obj=pickle.load(fp)
假设我必须从URL下载4000张图像。我成功下载了1000张图像,但是由于某些网络问题,我的脚本崩溃了。因此,我希望脚本重新启动时,它应该从图像编号1001开始下载。当前,如果脚本重新启动,它将再次从图像编号1重新开始。加载泡菜对象后如何再次运行循环?
答案 0 :(得分:1)
这个问题可能有多种解决方案,但首先要记住这将帮助您解决此问题。
方法:
非常清楚,该脚本从头开始下载,因为它直到上次下载时才记住索引。
要解决此问题,我们将创建一个文本文件,该文件的整数0表示该索引文件已下载。当脚本运行时,它会检查文本文件中存在的整数值。 (就像回想一下职位)。如果文件下载成功,则文本文件中的值将增加1。
代码
理解实例::
请参阅:我之前手动创建了一个文本文件,其中带有“ 0”。
# Opening the text file
counter = open('counter.txt',"r")
# Getting the position from where to start.Intially it's 0 later it will be updated
start = counter.read()
print("--> ",start)
counter.close()
for x in range(int(start),1000):
print("Processing Done upto : ",x)
#For every iteration we are writing it in the file with the new position
writer = open('counter.txt',"w")
writer.write(str(x))
writer.close()
修正您的代码:
注意:手动创建一个名称为'counter.txt'的文本文件,并在其中写入'0'。
import pandas as pd
import requests as rq
import os,time,random,pickle
import csv
data=pd.read_csv("consensus_data.csv",usecols=["CaptureEventID","Species"])
z=data.loc[ data.Species.isin(['buffalo']), :]
df1=pd.DataFrame(z)
data_2=pd.read_csv("all_images.csv")
df2=pd.DataFrame(data_2)
df3=pd.merge(df1,df2,on='CaptureEventID')
p=df3.to_csv('animal_img_list.csv',index=False)
# you need to change the location below
data_final = pd.read_csv("animal_img_list.csv")
output=("/home/avnika/data_serengeti/url_op")
mylist = []
for i in range(0,100):
x = random.randint(1,10)
mylist.append(x)
print(mylist)
for y in range(len(mylist)):
d=mylist[y]
print(d)
# Opeing the file you manually created with '0' present in it.
counter = open('counter.txt',"r")
start = counter.read()
count = start
counter.close()
file_name = data_final.URL_Info
print(len(file_name))
# The starting position from the file is used to slice the file_name from 'start' value.
for file in file_name[start:]:
image_url='https://snapshotserengeti.s3.msi.umn.edu/'+file
f_name=os.path.split(image_url)[-1]
print(f_name)
r=rq.get(image_url)
with open(output+"/"+f_name, 'wb') as f:
f.write(r.content)
# File is downloaded and now, it's time to update the counter in the text file with new position.
count+=1
writer = open('counter.txt',"w")
writer.write(str(count))
writer.close()
time.sleep(d)
希望这会有所帮助:)
答案 1 :(得分:-1)
您的问题的答案是使用pickle将对象保存在python中,您可以简单地将对象保存在pickle中并根据您的条件进行检索。因此,每当您遇到错误时,只需捕获错误并从您要启动脚本的位置将其编号转入pickle中即可。 您可以参考下面的链接。 https://pythonprogramming.net/python-pickle-module-save-objects-serialization。