Question

我试图从我的网页获取缩略图的所有网址，其中包含class =＆＃34; thumb＆＃34;，但是soup.find_all仅打印最近的22个左右。

以下是代码：

import requests
from bs4 import BeautifulSoup
r = requests.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit")
soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("a", {'class' : "thumb"})
for link in links:
    print(link.get("href"))

Answer 1

我认为您打算询问跟随分页并抓住列表中的所有链接。以下是该想法的实现 - 使用offset参数并抓取链接，直到没有更多链接将offset递增24（每页链接数）：

import requests
from bs4 import BeautifulSoup


offset = 0
links = []
with requests.Session() as session:
    while True:
        r = session.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit?offset=%d" % offset)
        soup = BeautifulSoup(r.content, "html.parser")
        new_links = [link["href"] for link in soup.find_all("a", {'class': "thumb"})]

        # no more links - break the loop
        if not new_links:
            break

        links.extend(new_links)
        print(len(links))
        offset += 24

print(links)

Soup.find_all仅返回Python 3.5.1中的一些结果

1 个答案: