我是python的新手。我试图编写一个代码从网站下载PDF。我下载了大约3000个文件后得到HTTP.error。程序应该跳过文件的下载,以防万一。
import requests,bs4,os,time,wget,sys
url = str(input("type the url:"))# type URL
res=requests.get(url) # get url in requests module
res.raise_for_status() # test whether the link
print("request raised")
pdf_links = bs4.BeautifulSoup(res.text,"html5lib") # read the webpage using HTML
print ("read website")
empty_list = []
i = 0
for link in pdf_links.findAll(title="PDF file that opens in a new window"):
i+=1
print(i)
get_url = link.get('href') # get all link addresses
com_url = str("http://ciconline.nic.in//rti/docs/"+str(get_url)) #combine link address with URL of website
empty_list.append(com_url)
print ("appended list")
for j in range(len(empty_list)):
while True:
try:
list_link = empty_list[j]
print ("downloading %d.%s"%(j,list_link))
wget.download(list_link,"D:\RTI\CIC-JAN-MAR-2016")
except:
print("Oops!",sys.exc_info()[0],"occured.")
print("started iteration %d"%j)
continue
break
答案 0 :(得分:1)
你应该在这里避免使用while循环;以下内容将跳过该条目,并在出现错误时尝试下载下一个条目
double a = 1.3;
double b = 3.0;
double bonusScore = a * b;
System.out.println(bonusScore); // expect bonusScore = 3.9
bonusScore = bonusScore - 3.9;
System.out.println(bonusScore); // expect bonusScore = 0