我正在学习BeautifulSoup,并且网页的内容类似于:
HTML:
<div>
<table>
<tr>
<td>
<div>
<a name='abc'>....</a>
</div>
</td>
</tr>
</table>
</div>
<a name='pqr'>...</a>
<div>text1</div>
<div>text2</div>
<div>text3</div>
<a name='mno'>...</a>
<div>
<table>
<tr>
<td>
<div>
<a name='xyz'>....</a>
</div>
</td>
</tr>
</table>
</div>
预期结果:
<a name='pqr'>...</a>
<div>text1</div>
<div>text2</div>
<div>text3</div>
<a name='mno'>...</a>
我的意思是,在达到'a name ='xyz''标签之前获取所有内容
答案 0 :(得分:0)
您可以make a function使所有div
个元素都拥有以前的兄弟pqr
链接和下一个兄弟mno
链接:
def desired_divs(elm):
if elm and elm.name == "div" and \
elm.find_previous_sibling("a", {"name": "pqr"}) and \
elm.find_next_sibling("a", {"name": "mno"}):
return elm
for div in soup.find_all(desired_divs):
print(div.text)
打印:
text1
text2
text3
或者,您可以找到开始的a
元素,然后迭代所有后续元素,并在途中点击收集a
文本的结尾div
元素时停止:
beginning = soup.find("a", {"name": "pqr"})
for elm in beginning.find_next_siblings():
if elm.name == "a" and elm.get("name") == "mno":
break
print elm.text
答案 1 :(得分:0)
我试过这个并且有效:
aref=soup.find('a',{"name": "abc"})
for i in aref.findAllNext():
if(i.attrs=={'name': 'xyz'}):
break
else:
print(i.text)