如何迭代div标签中的标签

时间:2016-06-14 08:52:27

标签: html python-2.7 beautifulsoup

我是BeautifulSoup的新手。 这是我感兴趣的html片段:

<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>&lsaquo;&lsaquo;</span> Prev</a><span class="act">1</span><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a 
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next 
<span>&rsaquo;&rsaquo;</span></a></div>

我想检查'a'标签中的最后一页编号的值是否为10。 我能够使用此命令获取标记:

atags1=bSoup.find('div' ,attrs={'class' : 'jpag'})

现在我想迭代没有像rel =“prev”或rel =“next”这样的属性的'a'标签,这样我将只用页码迭代'a'标签。请帮帮我。 提前谢谢。

1 个答案:

答案 0 :(得分:2)

有很多方法可以做到这一点,一种简单的方法是选择div中的锚点并过滤任何具有rel atttribute的锚点:

html = """<div class="jpag" id="srchpagination"><a rel='prev' class="dis"><span>&lsaquo;&lsaquo;</span> Prev</a><span class="act">1</span><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2' >2</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3' >3</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4' >4</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5' >5</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6' >6</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7' >7</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8' >8</a><a href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9' >9</a><a
href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10' >10</a><a rel='next' href='http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2'>Next
<span>&rsaquo;&rsaquo;</span></a></div>"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

for a in soup.select("#srchpagination a[href]"):
    if not a.get("rel"):
        print(a)

哪会给你:

<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-2">2</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-3">3</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-4">4</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-5">5</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-6">6</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-7">7</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-8">8</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-9">9</a>
<a href="http://www.justdial.com/Bangalore/Carpenters/ct-310711/page-10">10</a>