我从某个页面获取数据时遇到问题。这是我的代码的一部分:
for result in results:
street = result.find('p', attrs={'class':'size16'}).text
records.append((street))
print (street)
网站:
<div class="media-body pt5 pb10">
<div class="mb15">
<span class="map-item-city block mb0 colorgreen">City</span>
<p class="small mb20"> </p>
<p class="size16">street 98<br>phone. 22 721-56-70</p>
</div>
<div class="colorblack"><strong>open</strong></div>
<div class="mb20 size16">Mon.-Fr. 07.30-15.30</div>
<div class="mb15 ">
我的代码结果:
ul. Bema 2phone. (32) 745 72 66-69 Wroclaw None
ul. 1 Maja 22/Vphone. 537-943-969 Olawa <p class="small mb20 colorgreen">Placowka partnerska</p>
我想在“br”标签后分隔或删除文字。我只需要'街头'
<p class="size16">street 98<br>phone. 22 721-56-70</p>
你能帮助我吗?
答案 0 :(得分:1)
像这样使用previous_sibling:
from bs4 import BeautifulSoup
html = """
<div class="media-body pt5 pb10">
<div class="mb15">
<span class="map-item-city block mb0 colorgreen">Bronisze</span>
<p class="small mb20"> </p>
<p class="size16">Poznańska 98<br>tel. 22 721-56-70</p>
</div>
<div class="colorblack"><strong>Godziny otwarcia</strong></div>
<div class="mb20 size16">Pn.-Pt. 07.30-15.30</div>
<div class="mb15 ">
"""
result=BeautifulSoup(html, "lxml")
br = result.find('br')
print (br.previous_sibling)
或者如果你想稍微缩小一点:
street = result.find('p', attrs={'class':'size16'}).find('br').previous_sibling
print (street)
输出(两种情况下)
Poznańska 98
来自文档https://www.crummy.com/software/BeautifulSoup/bs4/doc/
.next_sibling和.previous_sibling
您可以使用.next_sibling和.previous_sibling在分析树的同一级别的页面元素之间导航: