我有这样的在线文本数据:
plain_text= "<a href="/url?q=https://www.aarnoldmovingcompany.com/contact-us/&sa=U&ved=0ahUKEwCgAMAA&usg=AOvVaw1pasRFOwk">
</b> Moving Louisville - Headquarters.<br>
commercial moving services nationwide. Visit our website today to learn more!<br><div class="osl">
<br>
5200 Interchange Way Louisville, KY 40229.<br>
... <b> A. Arnold</b>"
我正在尝试从此文本中提取所有<br>
标签,因此输出将类似于:
commercial moving services nationwide. Visit our website today to learn more
5200 Interchange Way Louisville, KY 40229.
这对我不起作用:
soup=BeautifulSoup(plain_text,"lxml")
out=soup.find_all('br')
它把我扔了
[<br/>,
<br/>]
答案 0 :(得分:0)
您可以使用next_sibling,请检查下面的代码。
from bs4 import BeautifulSoup
text = """<a href="/url?q=https://www.aarnoldmovingcompany.com/contact-us/&sa=U&ved=0ahUKEwCgAMAA&usg=AOvVaw1pasRFOwk">
</b> Moving Louisville - Headquarters.<br>
commercial moving services nationwide. Visit our website today to learn more!<br><div class="osl">
<br>
5200 Interchange Way Louisville, KY 40229.<br>
... <b> A. Arnold</b>"""
soup = BeautifulSoup(text,'lxml')
name = soup.br.next_sibling
address = name.next.next.text.strip()
print(name, '\n', address)
输出
commercial moving services nationwide. Visit our website today to learn more!
5200 Interchange Way Louisville, KY 40229.
... A. Arnold