确定下一个循环周期元素名称

时间:2014-08-03 20:02:30

标签: python beautifulsoup

我想循环使用BeautifulSoup的html元素列表但是对于每个元素我还想检查树的下一个元素的name

from bs4 import BeautifulSoup

html_doc = """
<!DOCTYPE html>
<html>
<body>

<div id="main">
  <p>1</p>
  <p>2</p>
  <b>3</b>
</div>

</body>
</html>
"""

soup = BeautifulSoup(html_doc)

for p in soup.find(id="main").find_all("p"):
    print p.get_text()
    if p.next_sibling.name == 'p':
        print "TRUE"

当然,这不起作用,循环中的下一个元素是一个空元素。是否可以控制原始树元素的下一个名称?

1 个答案:

答案 0 :(得分:0)

from bs4 import BeautifulSoup

html = """
<div id="main">
  <p>1</p>
  <p>2</p>
  <b>3</b>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')
elements = soup.find(id='main').find_all('p')

for p in elements:
    print p.text,
    try:
        next_ = elements[elements.index(p) + 1]
        print '(next tag is: %s)' % next_.name
    except IndexError:
        print "(this was the last element with tag 'p')"