Question

我是python的新手。我试图编写一个代码从网站下载PDF。我下载了大约3000个文件后得到HTTP.error。程序应该跳过文件的下载，以防万一。

import requests,bs4,os,time,wget,sys
url = str(input("type the url:"))# type URL

res=requests.get(url) # get url in requests module
res.raise_for_status() # test whether the link 
print("request raised")
pdf_links = bs4.BeautifulSoup(res.text,"html5lib") # read the webpage using HTML
print ("read website")

empty_list = []

i = 0
for link in pdf_links.findAll(title="PDF file that opens in a new window"):
    i+=1
print(i)
get_url = link.get('href') # get all link addresses
com_url = str("http://ciconline.nic.in//rti/docs/"+str(get_url)) #combine link address with URL of website
empty_list.append(com_url)
print ("appended list")

for j in range(len(empty_list)):
   while True:
      try:
        list_link = empty_list[j]
        print ("downloading %d.%s"%(j,list_link))
        wget.download(list_link,"D:\RTI\CIC-JAN-MAR-2016")

    except:
        print("Oops!",sys.exc_info()[0],"occured.")
        print("started iteration %d"%j)
        continue
    break

Answer 1

你应该在这里避免使用while循环;以下内容将跳过该条目，并在出现错误时尝试下载下一个条目

double a = 1.3;
double b = 3.0;
double bonusScore = a * b;

System.out.println(bonusScore); // expect bonusScore = 3.9

bonusScore = bonusScore - 3.9;
System.out.println(bonusScore); // expect bonusScore = 0

如果迭代得到错误，python将进入下一次迭代

1 个答案: