尝试使用Google搜索搜索图片,错误400

时间:2018-10-24 14:01:19

标签: python beautifulsoup

我不断收到此错误:urllib.error.HTTPError:HTTP错误400:错误的请求

我认为这可能与链接有关,因为当我将它们放入(并替换{})时,我会收到相同的错误,但是我不知道哪个链接是正确的/ (Python 3.6,Anaconda)

import os
import urllib.request as ulib
from bs4 import BeautifulSoup as Soup
import json

url_a = 'https://www.google.com/search?ei=1m7NWePfFYaGmQG51q7IBg&hl=en&q={}'
url_b = '\&tbm=isch&ved=0ahUKEwjjovnD7sjWAhUGQyYKHTmrC2kQuT0I7gEoAQ&start={}'
url_c = '\&yv=2&vet=10ahUKEwjjovnD7sjWAhUGQyYKHTmrC2kQuT0I7gEoAQ.1m7NWePfFYaGmQG51q7IBg'
url_d = '\.i&ijn=1&asearch=ichunk&async=_id:rg_s,_pms:s'
url_base = ''.join((url_a, url_b, url_c, url_d))

headers = {'User-Agent': 'Chrome/69.0.3497.100'}

def get_links(search_name):
    search_name = search_name.replace(' ', '+')
    url = url_base.format(search_name, 0)
    request = ulib.Request(url, data=None, headers=headers)
    json_string = ulib.urlopen(request).read()
    page = json.loads(json_string)
    new_soup = Soup(page[1][1], 'lxml')
    images = new_soup.find_all('img')
    links = [image['src'] for image in images]
    return links

if __name__ == '__main__':
    search_name = 'Thumbs up'
    links = get_links(search_name)

    for link in links:
        print(link)

2 个答案:

答案 0 :(得分:0)

我认为您有很多不需要的参数

尝试使用以下更简单的URL进行图像搜索:

https://www.google.com/search?q={KEY_WORD}&tbm=isch

例如:

https://www.google.com/search?q=apples&tbm=isch

答案 1 :(得分:0)

我认为问题出在asearch=ichunk&async=_id:rg_s,_pms:s中,该问题无法与search一起使用,如果我将其删除,它会起作用:

import os
import urllib.request as ulib
from bs4 import BeautifulSoup as Soup
import json

url_a = 'https://www.google.com/search?ei=1m7NWePfFYaGmQG51q7IBg&hl=en&q=a+mouse'
url_b = '\&tbm=isch&ved=0ahUKEwjjovnD7sjWAhUGQyYKHTmrC2kQuT0I7gEoAQ&start={}'
url_c = '\&yv=2&vet=10ahUKEwjjovnD7sjWAhUGQyYKHTmrC2kQuT0I7gEoAQ.1m7NWePfFYaGmQG51q7IBg'
url_d = '\.i&ijn=1'
url_base = ''.join((url_a, url_b, url_c, url_d))
print(url_base);

headers = {'User-Agent': 'Chrome/69.0.3497.100'}

def get_links(search_name):
    search_name = search_name.replace(' ', '+')
    url = url_base.format(search_name, 0)
    request = ulib.Request(url, data=None, headers=headers)
    json_string = ulib.urlopen(request).read()
    print(json_string)
    page = json.loads(json_string)
    new_soup = Soup(page[1][1], 'lxml')
    images = new_soup.find_all('img')
    links = [image['src'] for image in images]
    return links

if __name__ == '__main__':
    search_name = 'Thumbs up'
    links = get_links(search_name)

    for link in links:
        print(link)