如何从Google或任何网站下载大量图像

时间:2018-10-10 01:50:14

标签: python-3.x

实际上,我需要做一个有关机器学习的项目。在那我要训练很多图像。我搜索了此问题,但没有成功。 谁能帮我解决这个问题。预先感谢。

2 个答案:

答案 0 :(得分:1)

我用Google图片下载了使用硒的图片。这只是一种基本方法。

from selenium import webdriver
import time
import urllib.request
import os
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome("path\\to\\the\\webdriverFile") 
browser.get("https://www.google.com")

search = browser.find_element_by_name(‘q’)

search.send_keys(key_words,Keys.ENTER) # use required key_words to download images
elem = browser.find_element_by_link_text(‘Images’)
elem.get_attribute(‘href’)
elem.click()
value = 0
for i in range(20):
    browser.execute_script(“scrollBy(“+ str(value) +”,+1000);”)
    value += 1000
    time.sleep(3)
elem1 = browser.find_element_by_id(‘islmp’)
sub = elem1.find_elements_by_tag_name(“img”)
try:
    os.mkdir(‘downloads’)
except FileExistsError:
    pass
count = 0
for i in sub:
    src = i.get_attribute('src')
    try:
        if src != None:
            src  = str(src)
            print(src)
            count+=1
            urllib.request.urlretrieve(src, 
os.path.join('downloads','image'+str(count)+'.jpg'))
        else:
            raise TypeError
    except TypeError:
        print('fail')
    if count == required_images_number: ## use number as required
        break

check this详细说明。

下载驱动程序here

答案 1 :(得分:0)

给我的提示是:使用图片API。这是我的最爱:Bing Image Search API

以下是Send search queries using the REST API and Python中的文本。

运行快速入门

首先,将subscription_key设置为Bing API服务的有效订阅密钥。

Python

subscription_key = None
assert subscription_key

接下来,确认search_url端点正确。在撰写本文时,Bing搜索API仅使用一个端点。如果遇到授权错误,请针对Azure仪表板中的Bing搜索终结点再次检查此值。 Python

search_url = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"

设置search_term以查找小狗的图像。 Python

search_term = "puppies"

以下块使用Python中的请求库来调出Bing搜索API并将结果作为JSON对象返回。观察到我们通过headers字典传递了API密钥,并通过params字典传递了搜索词。要查看可用于过滤搜索结果的选项的完整列表,请参阅REST API文档。

Python

import requests

headers = {"Ocp-Apim-Subscription-Key" : subscription_key}
params  = {"q": search_term, "license": "public", "imageType": "photo"}
response = requests.get(search_url, headers=headers, params=params)
response.raise_for_status()
search_results = response.json()

search_results对象包含实际图像以及丰富的元数据(如相关项目)。例如,以下代码行可以提取前16个结果的缩略图URL。 Python

thumbnail_urls = [img["thumbnailUrl"] for img in search_results["value"][:16]]

然后使用PIL库下载缩略图图像,并使用matplotlib库将其渲染到$ 4 \ times 4 $网格上。

Python

%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO

f, axes = plt.subplots(4, 4)
for i in range(4):
    for j in range(4):
        image_data = requests.get(thumbnail_urls[i+4*j])
        image_data.raise_for_status()
        image = Image.open(BytesIO(image_data.content))        
        axes[i][j].imshow(image)
        axes[i][j].axis("off")
plt.show()

示例JSON响应

来自Bing Image Search API的响应以JSON的形式返回。该示例响应已被截断以显示单个结果。

JSON

{
"_type":"Images",
"instrumentation":{
    "_type":"ResponseInstrumentation"
},
"readLink":"images\/search?q=tropical ocean",
"webSearchUrl":"https:\/\/www.bing.com\/images\/search?q=tropical ocean&FORM=OIIARP",
"totalEstimatedMatches":842,
"nextOffset":47,
"value":[
    {
        "webSearchUrl":"https:\/\/www.bing.com\/images\/search?view=detailv2&FORM=OIIRPO&q=tropical+ocean&id=8607ACDACB243BDEA7E1EF78127DA931E680E3A5&simid=608027248313960152",
        "name":"My Life in the Ocean | The greatest WordPress.com site in ...",
        "thumbnailUrl":"https:\/\/tse3.mm.bing.net\/th?id=OIP.fmwSKKmKpmZtJiBDps1kLAHaEo&pid=Api",
        "datePublished":"2017-11-03T08:51:00.0000000Z",
        "contentUrl":"https:\/\/mylifeintheocean.files.wordpress.com\/2012\/11\/tropical-ocean-wallpaper-1920x12003.jpg",
        "hostPageUrl":"https:\/\/mylifeintheocean.wordpress.com\/",
        "contentSize":"897388 B",
        "encodingFormat":"jpeg",
        "hostPageDisplayUrl":"https:\/\/mylifeintheocean.wordpress.com",
        "width":1920,
        "height":1200,
        "thumbnail":{
        "width":474,
        "height":296
        },
        "imageInsightsToken":"ccid_fmwSKKmK*mid_8607ACDACB243BDEA7E1EF78127DA931E680E3A5*simid_608027248313960152*thid_OIP.fmwSKKmKpmZtJiBDps1kLAHaEo",
        "insightsMetadata":{
        "recipeSourcesCount":0,
        "bestRepresentativeQuery":{
            "text":"Tropical Beaches Desktop Wallpaper",
            "displayText":"Tropical Beaches Desktop Wallpaper",
            "webSearchUrl":"https:\/\/www.bing.com\/images\/search?q=Tropical+Beaches+Desktop+Wallpaper&id=8607ACDACB243BDEA7E1EF78127DA931E680E3A5&FORM=IDBQDM"
        },
        "pagesIncludingCount":115,
        "availableSizesCount":44
        },
        "imageId":"8607ACDACB243BDEA7E1EF78127DA931E680E3A5",
        "accentColor":"0050B2"
    }
}