Question

如何提取“ br /”标签后的文本？我只输入那些文字，而不是“ strong”标记中的任何文字。

<p><strong>A title</strong><br/>
Text I want which also
includes linebreaks.</p>

尝试过类似的代码

text_content = paragraph.get_text(separator='strong/').strip()

但这也将在“ strong”标记中包含文本。

如果不清楚，“ paragraph”变量是bs4.element.Tag。

任何帮助表示赞赏！

Answer 1

如果您有public class DefaultFooFactory: IFooFactory{ public IFoo create(){return new DefaultFoo();} }标记，则在其中找到<p>并使用<br>

.next_siblings

输出：

import bs4

html = '''<p><strong>A title</strong><br/>
Text I want which also
includes linebreaks.</p>'''

soup = bs4.BeautifulSoup(html, 'html.parser')

paragraph = soup.find('p')
text_wanted = ''.join(paragraph.find('br').next_siblings)

print (text_wanted)

Answer 2

找到<br>标记并使用next_element

from bs4 import BeautifulSoup

data='''<p><strong>A title</strong><br/>
Text I want which also
includes linebreaks.</p>'''

soup=BeautifulSoup(data,'html.parser')
item=soup.find('p').find('br').next_element
print(item)

使用Beautifulsoup提取部分文本

2 个答案: