我编写了一个代码,用于从网站下载和保存图像。它工作得很好,但对于某些网址,它显示错误。我在下面有粘贴代码
import urllib2
import webbrowser
imageurl='http://www.example.com/'+image[s]
opener1 = urllib2.build_opener()
page1=opener1.open(imageurl)
my_picture=page1.read()
image1=image[s].replace("/","")
fout = open('images/tony/'+image1, "wb")
fout.write(my_picture)
fout.close()
实际上我获得了很多图像值,并且几乎正在工作。但是当图像的值[s] = images / PG013001 GROUP 2.jpg时,编译器会出错
File "leather.py", line 37, in get_leather page1=opener1.open(imageurl) File "D:\Program Files\Python\lib\urllib2.py", line 395, in open response = meth(req, response) File "D:\Program Files\Python\lib\urllib2.py", line 508, in http_response 'http', request, response, code, msg, hdrs) File "D:\Program Files\Python\lib\urllib2.py", line 433, in error return self._call_chain(*args) File "D:\Program Files\Python\lib\urllib2.py", line 367, in _call_chain result = func(*args) File "D:\Program Files\Python\lib\urllib2.py", line 516, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found
我认为相应的图像即'http://www.example.com/images/PG013001 GROUP 2.jpg'不存在但是当它被检查时它存在。请建议修复
问候
答案 0 :(得分:1)
您应该修复此链接。试试这个:
>>> import urllib
>>> urllib.quote("images/PG013001 GROUP 2.jpg")
'images/PG013001%20GROUP%202.jpg'
答案 1 :(得分:1)
网址不能直接包含空格;根本不允许这样做。你想要做的是引用或编码文件名中的空格,以便网址变得合法。 Here's wikipedia on the matter.
因此,您想引用传递给urllib2的网址。在您的代码中,您可以通过将一行更改为如下所示来实现:
page1=opener1.open(urllib2.quote(imageurl))
那就是这样做。