我正在废弃一个网页,我需要知道有多少页要废弃。它如下:
<div class="pagination">
<a href="/travel__world-desktop-wallpapers/page/2">2</a>
<a href="/travel__world-desktop-wallpapers/page/3">3</a>
<a href="/travel__world-desktop-wallpapers/page/4">4</a>
...
<a href="/travel__world-desktop-wallpapers/page/31">31</a>
<a href="/travel__world-desktop-wallpapers/page/32">32</a>
<a href="/travel__world-desktop-wallpapers/page/33">33</a>
<a href="/travel__world-desktop-wallpapers/page/2">Next »</a>
</div>
如何设置列表理解,返回最多页数(在本例中为33)?
答案 0 :(得分:2)
你没有。您改为设置生成器表达式:
max(int(link.text)
for link in soup.find('div', class_='pagination').find_all('a')
if link.text.strip().isdigit())
演示:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div class="pagination">
... <a href="/travel__world-desktop-wallpapers/page/2">2</a>
... <a href="/travel__world-desktop-wallpapers/page/3">3</a>
... <a href="/travel__world-desktop-wallpapers/page/4">4</a>
... ...
... <a href="/travel__world-desktop-wallpapers/page/31">31</a>
... <a href="/travel__world-desktop-wallpapers/page/32">32</a>
... <a href="/travel__world-desktop-wallpapers/page/33">33</a>
... <a href="/travel__world-desktop-wallpapers/page/2">Next »</a>
... </div>
... ''')
>>> max(int(link.text) for link in soup.find('div', class_='pagination').find_all('a') if link.text.strip().isdigit())
33