我尝试从谷歌中删除一些图片,但这次向下滚动扩展网站限制我只下载一定数量的图片。有没有办法模仿python代码?例如,如果可能,Machanize可能会在这种情况下使用。
因此,我需要模拟Google图片搜索的向下滚动扩展,以增加返回结果的数量,并将图片网址废弃。
答案 0 :(得分:3)
这可能会很快让你被禁止,但我不确定。这需要BeautifulSoup并请求。
import requests
from bs4 import BeautifulSoup
s = requests.session()
s.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"})
URL = "https://www.google.dk/search"
images = []
def get_images(query, start):
screen_width = 1920
screen_height = 1080
params = {
"q": query,
"sa": "X",
"biw": screen_width,
"bih": screen_height,
"tbm": "isch",
"ijn": start/100,
"start": start,
#"ei": "" - This seems like a unique ID, you might want to use it to avoid getting banned. But you probably still are.
}
request = s.get(URL, params=params)
bs = BeautifulSoup(request.text)
for img in bs.findAll("div", {"class": "rg_di"}):
images.append(img.find("img").attrs['data-src'])
#Will get 400 images.
for x in range(0, 5):
get_images("cats", x*100)
for x in images:
print x