Soup.find_all仅返回Python 3.5.1中的一些结果

时间:2016-06-05 14:55:36

标签: python python-3.x beautifulsoup python-requests

我试图从我的网页获取缩略图的所有网址,其中包含class =" thumb",但是soup.find_all仅打印最近的22个左右。

以下是代码:

import requests
from bs4 import BeautifulSoup
r = requests.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit")
soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("a", {'class' : "thumb"})
for link in links:
    print(link.get("href"))

1 个答案:

答案 0 :(得分:0)

我认为您打算询问跟随分页并抓住列表中的所有链接。以下是该想法的实现 - 使用offset参数并抓取链接,直到没有更多链接将offset递增24(每页链接数):

import requests
from bs4 import BeautifulSoup


offset = 0
links = []
with requests.Session() as session:
    while True:
        r = session.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit?offset=%d" % offset)
        soup = BeautifulSoup(r.content, "html.parser")
        new_links = [link["href"] for link in soup.find_all("a", {'class': "thumb"})]

        # no more links - break the loop
        if not new_links:
            break

        links.extend(new_links)
        print(len(links))
        offset += 24

print(links)