我正在尝试抓取this页(主页)。
现在,我要通过进入每个产品页面来抓取所有图像。
所以应该就像从主页转到产品页面下载所有产品图像,然后回到主页然后再进入下一个产品页面,依此类推。
我使用了requests
库,以下是我从主页获取名称和图像的代码
如何扩展此代码以从产品页面获取产品图片
url = 'https://middleware.paytmmall.com/fmcg-foods-glpid-101405'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
payload = {
'channel': 'web',
'child_site_id': '6',
'site_id': '2',
'version': '2',
'discoverability': 'online',
'use_mw': '1',
'category': '101405',
'page': '1',
'page_count': '1',
'items_per_page': '32'}
#total pages needed
jsonData = requests.post(url, headers=headers, data=payload).json()
total_count = jsonData['totalCount']
total_pages = total_count / 32
pages = math.ceil(total_pages)
from pandas import DataFrame
NAME = []
IMG = []
for page in range(1,pages + 1):
payload.update({'page':page, 'page_count':page})
jsonData = requests.post(url, headers=headers, data=payload).json()
for product in jsonData['grid_layout']:
name = product['name']
img = product['image_url']
print ('Name: %s\nImage: %s\n' %(name, img))
NAME.append(name)
IMG.append(img)
例如:this是主页上第一个产品的页面,我要从那里下载所有产品图像,然后返回主页,然后转到下一个产品页面。
答案 0 :(得分:1)
在print ('Name: %s\nImage: %s\n' %(name, img))
语句之后合并以下代码,
它将下载所有图像并保存在当前脚本目录中。
图像以其image name
的形式保存在指定的URL中。
imagename = img.split("/")[-1]
r = requests.get(img)
if r.status_code == 200:
with open(imagename, 'wb') as f:
f.write(r.content)
OR:
如果您不想将图像保存在当前脚本目录中,只想进行图像处理 内容尝试这个。
imagename = img.split("/")[-1]
r = requests.get(img)
if r.status_code == 200:
img_dict = dict(imageName=imagename,content=r.content)
NAME.append(name)
IMG.append(img_dict)
更新:
获取产品的所有图片
img_url = product['url']
img_response = requests.get(img_url).json()
if "other_images" in img_response:
print(img_response['other_images'])
O / P:
[
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_15.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_16.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_17.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_18.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_19.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_20.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASRLNC-C-500GNTBL4974726639099/a_21.jpg'
][
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TETBL4974748E953C4/a_22.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TETBL4974748E953C4/a_23.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TETBL4974748E953C4/a_24.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TETBL4974748E953C4/a_25.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TETBL4974748E953C4/a_26.jpg',
'https://assetscdn1.paytm.com/images/catalog/product/F/FA/FASTAJ-MAHAL-TETBL4974748E953C4/a_27.jpg'
]
.....