使用Python错误从Google下载图片?

时间:2014-02-06 20:25:45

标签: python image downloading

Python新手,我尝试自动从Google下载图片。我想输入一个关键字,然后让我的程序自动下载/将图像从Google下载/保存到文件夹中,以便在我的计算机上可用。这是我的代码:

import json
import os
import time
import requests
from PIL import Image
from StringIO import StringIO
from requests.exceptions import ConnectionError


def go(query, path):

 BASE_URL = 'https://ajax.googleapis.com/ajax/services/search/images?'\
         'v=1.0&q=' + query + '&start=%d'

 BASE_PATH = os.path.join(path, query)

 if not os.path.exists(BASE_PATH):
 os.makedirs(BASE_PATH)

start = 0 # Google's start query string parameter for pagination.
while start < 60: # Google will only return a max of 56 results.
r = requests.get(BASE_URL % start)
for image_info in json.loads(r.text)['responseData']['results']:
  url = image_info['unescapedUrl']
  try:
    image_r = requests.get(url)
  except ConnectionError, e:
    print 'could not download %s' % url
    continue

  # Remove file-system path characters from name.
  title = image_info['titleNoFormatting'].replace('/', '').replace('\\', '')

  file = open(os.path.join(BASE_PATH, '%s.jpg') % title, 'w')
  try:
    Image.open(StringIO(image_r.content)).save(file, 'JPEG')
  except IOError, e:
    # Throw away some gifs
    print 'could not save %s' % url
    continue
  finally:
    file.close()

print start
start += 4 # 4 images per page.


time.sleep(1.5)

使用示例

go(&#39;愤怒的人脸&#39;,&#39; myDirectory&#39;)

但我不断收到错误说:

file = open(os.path.join(BASE_PATH, '%s.jpg') % title, 'w')
IOError: [Errno 22] invalid mode ('w') or 
filename: u'myDirectory\\landscape\\Nature - Landscapes - Views - Desktop Wallpapers |    MIRIADNA..jpg'

我该怎么做才能解决这个问题?请帮忙!对此,我真的非常感激。

1 个答案:

答案 0 :(得分:1)

filename: u'... - Desktop Wallpapers |    MIRIADNA..jpg'
                                     ^ This is a problem

Windows不允许文件名中的管道符(|)。

来自http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

  

以下保留字符:

     
      
  • &LT; (小于)
  •   
  • &GT; (大于)
  •   
  • :(冒号)
  •   
  • “(双引号)
  •   
  • /(正斜线)
  •   
  • \(反斜杠)
  •   
  • | (竖杆或竖管)
  •   
  • ? (问号)
  •   
  • *(星号)
  •   

在您的情况下,保留字符出现在您正在下载的图片的标题中,随后用于您的文件名。您可以非常轻松地删除这些字符,例如:

title = ''.join('%s' % lett for lett in [let for let in title if let not in '<>:"/\|?*'])