使用Beautifulsoup提取部分文本

时间:2019-05-15 09:34:05

标签: python beautifulsoup

如何提取“ br /”标签后的文本? 我只输入那些文字,而不是“ strong”标记中的任何文字。

<p><strong>A title</strong><br/>
Text I want which also
includes linebreaks.</p>

尝试过类似的代码

text_content = paragraph.get_text(separator='strong/').strip()

但这也将在“ strong”标记中包含文本。

如果不清楚,“ paragraph”变量是bs4.element.Tag。

任何帮助表示赞赏!

2 个答案:

答案 0 :(得分:1)

如果您有public class DefaultFooFactory: IFooFactory{ public IFoo create(){return new DefaultFoo();} } 标记,则在其中找到<p>并使用<br>

.next_siblings

输出:

import bs4

html = '''<p><strong>A title</strong><br/>
Text I want which also
includes linebreaks.</p>'''

soup = bs4.BeautifulSoup(html, 'html.parser')

paragraph = soup.find('p')
text_wanted = ''.join(paragraph.find('br').next_siblings)

print (text_wanted)

答案 1 :(得分:1)

找到<br>标记并使用next_element

from bs4 import BeautifulSoup

data='''<p><strong>A title</strong><br/>
Text I want which also
includes linebreaks.</p>'''

soup=BeautifulSoup(data,'html.parser')
item=soup.find('p').find('br').next_element
print(item)