我需要抓取代码中给出的url页面的所有图像,但是我只能手动完成每一页直到最后一页(第100页)。
这是抓取每页的代码,我每次都替换页码并运行代码!
下方
在这种情况下,是否有任何方法可以添加变量函数并运行循环直到出现错误,直到出现404页(因为不再剩余页)?
from bs4 import*
import requests as rq
r2 = rq.get("https://www.gettyimages.in/photos/aishwarya-rai?family=editorial&page=1&phrase=aishwarya%20rai&sort=mostpopular")
soup2 = BeautifulSoup(r2.text, "html.parser")
links = []
x = soup2.select('img[src^="https://media.gettyimages.com/photos/"]') #the frame where it shows the images
for img in x:
links.append(img['src'])
for index, img_link in enumerate(links):
img_data = rq.get(img_link).content
with open("aishwarya_rai/"+str(index+2)+'.jpg', 'wb+') as f:
f.write(img_data)
else:
f.close()
答案 0 :(得分:0)
使用format()
函数并传递页面变量。
from bs4 import*
import requests as rq
url="https://www.gettyimages.in/photos/aishwarya-rai?family=editorial&page={}&phrase=aishwarya%20rai&sort=mostpopular"
links = []
for page in range(1,101):
print(url.format(page))
r2 = rq.get(url.format(page))
soup2 = BeautifulSoup(r2.text, "html.parser")
x = soup2.select('img[src^="https://media.gettyimages.com/photos/"]')
for img in x:
links.append(img['src'])
print(links)