Question

我正在写一个python程序，用于下载我学校学生的一些照片。

这是我的代码：`

import os
count = 0
max_c = 1000000
while max_c >= count:
    os.system("curl http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg > "+str(count)+".jpg")
    count=count+1

`

问题是我只想保存jpg，如果图像存在于服务器上（不是404），并且由于我没有服务器上的所有图像名称，我必须发送所有图像的请求介于0和1000000之间，但不存在0到1000000之间的所有图像。所以我只想保存图像，如果它存在于服务器上。我该怎么做（ubuntu）？

提前谢谢

Answer 1

您可以使用“-f”arg以静默方式失败（不输出HTTP错误），例如：

curl -f site.com/file.jpg

Answer 2

import urllib2
import sys

for i in range(1000000):
  try:
    pic = urllib2.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(i)+".jpg").read()
    with open(str(i).zfill(7)+".jpg") as f:
      f.write(pic)
    print "SUCCESS "+str(i)
  except KeyboardInterrupt:
    sys.exit(1)
  except urllib2.HTTPError, e:
    print "ERROR("+str(e.code)+") "+str(i)

应该有效，404会抛出异常

Answer 3

我建议使用python提供的urllib库作为您的目的。

count = 0
max_c = 1000000
while max_c >= count:
    resp = urllib.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg")
    if resp.getcode() == 404:
      //do nothing
    else:
    // do what you got to do.

   count=count+1

Answer 4

我认为最简单的方法是使用wget代替curl，这会自动丢弃404回复。

Answer 5

这是旧的，但是我在bash中发现你可以使用--fail，它会无声地失败。如果页面出错，则不会下载...

如果不是404，卷曲仅保存

5 个答案: