我是BeautifulSoup的新手。 这是我感兴趣的html片段:
<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>‹‹</span> Prev</a><span class="act">1</span><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next
<span>››</span></a></div>
我想检查'a'标签中的最后一页编号的值是否为10。 我能够使用此命令获取标记:
atags1=bSoup.find('div' ,attrs={'class' : 'jpag'})
现在我想迭代没有像rel =“prev”或rel =“next”这样的属性的'a'标签,这样我将只用页码迭代'a'标签。请帮帮我。 提前谢谢。
答案 0 :(得分:2)
有很多方法可以做到这一点,一种简单的方法是选择div中的锚点并过滤任何具有rel
atttribute的锚点:
html = """<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>‹‹</span> Prev</a><span class="act">1</span><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next
<span>››</span></a></div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for a in soup.select("#srchpagination a[href]"):
if not a.get("rel"):
print(a)
哪会给你:
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2">2</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3">3</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4">4</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5">5</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6">6</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7">7</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8">8</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9">9</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10">10</a>