我的html结构是:
<div class="layout4-background">
<h6 class="game">Game1. How to get all listings below and assign to class"game"?</h6>
<ul>
<li class="listing">
</ul>
<ul>
<li class="listing">
</ul>
<ul>
<li class="listing">
</ul>
<h6 class="game">Game2. How to get all listings below and assign to class"game?</h6>
<ul>
<li class="listing">
</ul>
<h6 class="game">Game3. How to get all listings below and assign to class"game?</h6>
<ul>
<li class="listing">
</ul>
</div>
这是一个div区块。基本上我需要创建每个h6类的列表。第一个h6 - 3上市,第二个h6 - 1上市,第三个h6 - 1上市。有没有办法用BeautifulSoup做到这一点? 谢谢
答案 0 :(得分:0)
您可以迭代.find_next_siblings()
<ul>
元素的结果:
from itertools import takewhile, ifilter
div = soup.find('div', class_='layout4-background')
for header in div.find_all('h6'):
print header.get_text()
listings = takewhile(lambda t: t.name == 'ul',
header.find_next_siblings(text=False))
for listing in listings:
# do something with listing
find_next_siblings()
搜索查找不仅仅是文本节点的所有节点(跳过其间的空格)。
itertools.takewhile()
iterable允许您选择 所有<ul>
标记的下一个元素。
演示:
>>> from bs4 import BeautifulSoup
>>> from itertools import takewhile
>>> soup = BeautifulSoup('''\
... <div class="layout4-background">
... <h6 class="game">Game1. How to get all listings below and assign to class"game"?</h6>
... <ul>
... <li class="listing">
... </ul>
... <ul>
... <li class="listing">
... </ul>
... <ul>
... <li class="listing">
... </ul>
... <h6 class="game">Game2. How to get all listings below and assign to class"game?</h6>
... <ul>
... <li class="listing">
... </ul>
... <h6 class="game">Game3. How to get all listings below and assign to class"game?</h6>
... <ul>
... <li class="listing">
... </ul>
... </div>
... ''')
>>> div = soup.find('div', class_='layout4-background')
>>> for header in div.find_all('h6'):
... print header.get_text()
... listings = takewhile(lambda t: t.name == 'ul',
... header.find_next_siblings(text=False))
... print 'Listings found:', len(list(listings))
...
Game1. How to get all listings below and assign to class"game"?
Listings found: 3
Game2. How to get all listings below and assign to class"game?
Listings found: 1
Game3. How to get all listings below and assign to class"game?
Listings found: 1