我想从下面的摘录中提取文本“这就是我要提取的文本”。有没有人有什么建议?谢谢!
<span class="cw-type__h2 Ingredients-title">Ingredients</span>
<p>
THIS IS THE TEXT I WANT TO EXTRACT</p>
答案 0 :(得分:0)
from bs4 import BeautifulSoup
html = """<span class="cw-type__h2 Ingredients-title">Ingredients</span><p>THIS IS THE TEXT I WANT TO EXTRACT</p>"""
soup = BeautifulSoup(html,'lxml')
print(soup.p.text)
答案 1 :(得分:0)
假设可能还有更多的html,我将使用span
和adjacent sibling combinator和p
类型选择器的类来定位适当的p
标签
from bs4 import BeautifulSoup as bs
html = '''
<span class="cw-type__h2 Ingredients-title">Ingredients</span>
<p>
THIS IS THE TEXT I WANT TO EXTRACT</p>
'''
soup = bs(html, 'lxml')
print(soup.select_one('.Ingredients-title + p').text.strip())