Question

所以我正在尝试编写一个脚本来下载python的图片文件，我发现这个def使用谷歌，但我得到它下载的每张图片都出现“腐败”。任何想法......

def download(url):
 """Copy the contents of a file from a given URL
 to a local file.
 """
 import urllib
 webFile = urllib.urlopen(url)
 localFile = open(url.split('/')[-1], 'w')
 localFile.write(webFile.read())
 webFile.close()
 localFile.close()

编辑：代码标签没有很好地保留缩进，但我可以向你保证他们在那里，这不是我的问题。

Answer 1

你可以simply do

urllib.urlretrieve(url, filename)

并为自己省去任何麻烦。

Answer 2

您需要以二进制模式打开本地文件：

localFile = open(url.split('/')[-1], 'wb')

否则二进制流中的CR / LF字符将被破坏，从而破坏文件。

Answer 3

如果您打算编写二进制文件，则必须包含'b'标志。第7行变为：

localFile = open(url.split('/')[-1], 'wb')

代码无需工作，但将来您可能会考虑：

导入功能之外。
使用os.path.basename，而不是字符串解析来获取路径的名称组件。
使用with语句管理文件，而不必手动关闭它们。它使您的代码更清晰，并确保在代码抛出异常时正确关闭它们。

我会将您的代码重写为：

import urllib
import os.path

def download(url):
 """Copy the contents of a file from a given URL
 to a local file in the current directory.
 """
 with urllib.urlopen(url) as webFile:
  with open(os.path.basename(url), 'wb') as localFile:
   localFile.write(webFile.read())

Answer 4

它出现了损坏，因为您正在使用的函数是将字节写入文件，就好像它是纯文本一样。但是，您需要做的是以二进制模式（wb）将字节写入其中。以下是您应该做的事情：

import urllib

def Download(url, filename):
  Data = urllib.urlopen(url).read()
  File = open(filename, 'wb')
  File.Write(Data)
  #Neatly close off the file...
  File.flush()
  File.close()
  #Cleanup, for you neat-freaks.
  del Data, File

Answer 5

import subprocess
outfile = "foo.txt"
url = "http://some/web/site/foo.txt"
cmd = "curl.exe -f -o %(outfile)s %(url)s" % locals()
subprocess.check_call(cmd)

脱离可能看起来不那么优雅但是当你开始遇到更复杂网站的问题时，curl有很多逻辑可以帮助你解决网络服务器提供的障碍（cookie，身份验证，会话等）。）

wget是另一种选择。

Python下载器

5 个答案: