如何使用BeautifulSoup提取介于两者之间的代码?

时间:2019-10-19 01:00:42

标签: python web-scraping beautifulsoup

我想从下面的摘录中提取文本“这就是我要提取的文本”。有没有人有什么建议?谢谢!

<span class="cw-type__h2 Ingredients-title">Ingredients</span>
<p>
                                THIS IS THE TEXT I WANT TO EXTRACT</p>

2 个答案:

答案 0 :(得分:0)

from bs4 import BeautifulSoup
html = """<span class="cw-type__h2 Ingredients-title">Ingredients</span><p>THIS IS THE TEXT I WANT TO EXTRACT</p>"""
soup = BeautifulSoup(html,'lxml')
print(soup.p.text)

答案 1 :(得分:0)

假设可能还有更多的html,我将使用spanadjacent sibling combinatorp类型选择器的类来定位适当的p标签

from bs4 import BeautifulSoup as bs

html = '''
<span class="cw-type__h2 Ingredients-title">Ingredients</span>
<p>
                                THIS IS THE TEXT I WANT TO EXTRACT</p>
                                '''
soup = bs(html, 'lxml')
print(soup.select_one('.Ingredients-title + p').text.strip())