如何提取文本,直到在漂亮的汤中标记<br/>

时间:2018-10-24 18:20:24

标签: python beautifulsoup html-parsing

我想从div中提取,直到<br>标签为止。怎么做,

例如,

<div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">Watched a video that has been removed<br>Aug 17, 2018, 2:34:28 PM UTC</div>

我用了这个

print  content.text

它输出,

Watched a video that has been removedAug 17, 2018, 2:34:28 PM UTC

但是预期输出是     观看了已删除的视频

我不要在<br>之后输入文字

此外,在<br>之后,我可以尝试一下

content.find('br').text

现在我正在考虑做下面的事情

result= (content.find('br').text).replace((content.find('br').text),'')

还有其他更好的方法来避免使用beautifulsoup进行额外的字符串替换吗?

1 个答案:

答案 0 :(得分:2)

a   avg
1   2/3
2   1/2

输出应为:

from bs4 import BeautifulSoup

html="""<div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">Watched a video that has been removed<br>Aug 17, 2018, 2:34:28 PM UTC</div>"""
soup = BeautifulSoup(html)
print(soup.find("div").contents[0])