我想从div中提取,直到<br>
标签为止。怎么做,
例如,
<div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">Watched a video that has been removed<br>Aug 17, 2018, 2:34:28 PM UTC</div>
我用了这个
print content.text
它输出,
Watched a video that has been removedAug 17, 2018, 2:34:28 PM UTC
但是预期输出是 观看了已删除的视频
我不要在<br>
之后输入文字
此外,在<br>
之后,我可以尝试一下
content.find('br').text
现在我正在考虑做下面的事情
result= (content.find('br').text).replace((content.find('br').text),'')
还有其他更好的方法来避免使用beautifulsoup进行额外的字符串替换吗?
答案 0 :(得分:2)
a avg
1 2/3
2 1/2
输出应为:
from bs4 import BeautifulSoup
html="""<div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">Watched a video that has been removed<br>Aug 17, 2018, 2:34:28 PM UTC</div>"""
soup = BeautifulSoup(html)
print(soup.find("div").contents[0])