我正在尝试找到一种从Google图片搜索批量下载全尺寸图片文件的高效且可复制的方式。其他人也问了类似的事情,但我没有发现任何我正在寻找或理解的东西。
大多数指的是折旧的Google图片搜索API或Google自定义搜索API,它似乎不适用于整个网络,或仅仅是从单个网址下载图片。
我想这可能是一个两步过程:首先,从搜索中提取所有图像URL,然后从那些批量下载?
我应该补充一点,我是一个初学者(这可能很明显;对不起)。因此,如果有人能够解释并指出我正确的方向,那将非常感激。
我也研究过免费软件选项,但这些看起来也很不稳定。除非有人知道可靠的。
Download images from google image search (python)
如果有人对此标签有所了解,以及它们是否存在于某些地方/与图像相关联? https://en.wikipedia.org/wiki/Google_Image_Labeler
import json
import os
import time
import requests
from PIL import Image
from StringIO import StringIO
from requests.exceptions import ConnectionError
def go(query, path):
"""Download full size images from Google image search.
Don't print or republish images without permission.
I used this to train a learning algorithm.
"""
BASE_URL = 'https://ajax.googleapis.com/ajax/services/search/images?'\
'v=1.0&q=' + query + '&start=%d'
BASE_PATH = os.path.join(path, query)
if not os.path.exists(BASE_PATH):
os.makedirs(BASE_PATH)
start = 0 # Google's start query string parameter for pagination.
while start < 60: # Google will only return a max of 56 results.
r = requests.get(BASE_URL % start)
for image_info in json.loads(r.text)['responseData']['results']:
url = image_info['unescapedUrl']
try:
image_r = requests.get(url)
except ConnectionError, e:
print 'could not download %s' % url
continue
# Remove file-system path characters from name.
title = image_info['titleNoFormatting'].replace('/', '').replace('\\', '')
file = open(os.path.join(BASE_PATH, '%s.jpg') % title, 'w')
try:
Image.open(StringIO(image_r.content)).save(file, 'JPEG')
except IOError, e:
# Throw away some gifs...blegh.
print 'could not save %s' % url
continue
finally:
file.close()
print start
start += 4 # 4 images per page.
# Be nice to Google and they'll be nice back :)
time.sleep(1.5)
# Example use
go('landscape', 'myDirectory')
我能够使用指定here的完整网络创建自定义搜索,并成功执行以获取图片链接,但如前一篇文章中所述,它们与正常情况不完全对齐Google图片搜索结果。
答案 0 :(得分:0)
尝试使用ImageSoup模块。要安装它,只需:
pip install imagesoup
示例代码:
>>> from imagesoup import ImageSoup
>>>
>>> soup = ImageSoup()
>>> images_wanted = 50
>>> query = 'landscape'
>>> images = soup.search(query, n_images=50)
现在您有一个包含来自Google图片的50张风景图片的列表。让我们玩第一个:
>>> im = images[0]
>>> im.URL
https://static.pexels.com/photos/279315/pexels-photo-279315.jpeg
>>> im.size
(2600, 1300)
>>> im.mode
RGB
>>> im.dpi
(300, 300)
>>> im.color_count
493230
>>> # Let's check the main 4 colors in the image. We use
>>> # reduce_size = True to speed up the process.
>>> im.main_color(reduce_size=True, n=4))
[('black', 0.2244), ('darkslategrey', 0.1057), ('darkolivegreen', 0.0761), ('dodgerblue', 0.0531)]
# Let's take a look on our image
>>> im.show()
>>> # Nice image! Let's save it.
>>> im.to_file('landscape.jpg')
每次搜索返回的图像数量可能会发生变化。通常是小于900的数字。如果要获取所有图像,请设置n_images = 1000。
要提交或报告错误,请查看github repo:https://github.com/rafpyprog/ImageSoup