我想循环使用BeautifulSoup的html元素列表但是对于每个元素我还想检查树的下一个元素的name
from bs4 import BeautifulSoup
html_doc = """
<!DOCTYPE html>
<html>
<body>
<div id="main">
<p>1</p>
<p>2</p>
<b>3</b>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_doc)
for p in soup.find(id="main").find_all("p"):
print p.get_text()
if p.next_sibling.name == 'p':
print "TRUE"
当然,这不起作用,循环中的下一个元素是一个空元素。是否可以控制原始树元素的下一个名称?
答案 0 :(得分:0)
from bs4 import BeautifulSoup
html = """
<div id="main">
<p>1</p>
<p>2</p>
<b>3</b>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
elements = soup.find(id='main').find_all('p')
for p in elements:
print p.text,
try:
next_ = elements[elements.index(p) + 1]
print '(next tag is: %s)' % next_.name
except IndexError:
print "(this was the last element with tag 'p')"