我试图从我的网页获取缩略图的所有网址,其中包含class =" thumb",但是soup.find_all仅打印最近的22个左右。
以下是代码:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit")
soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("a", {'class' : "thumb"})
for link in links:
print(link.get("href"))
答案 0 :(得分:0)
我认为您打算询问跟随分页并抓住列表中的所有链接。以下是该想法的实现 - 使用offset
参数并抓取链接,直到没有更多链接将offset
递增24(每页链接数):
import requests
from bs4 import BeautifulSoup
offset = 0
links = []
with requests.Session() as session:
while True:
r = session.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit?offset=%d" % offset)
soup = BeautifulSoup(r.content, "html.parser")
new_links = [link["href"] for link in soup.find_all("a", {'class': "thumb"})]
# no more links - break the loop
if not new_links:
break
links.extend(new_links)
print(len(links))
offset += 24
print(links)