Question

我想从下面的摘录中提取文本“这就是我要提取的文本”。有没有人有什么建议？谢谢！

<span class="cw-type__h2 Ingredients-title">Ingredients</span>
<p>
                                THIS IS THE TEXT I WANT TO EXTRACT</p>

Answer 1

from bs4 import BeautifulSoup
html = """<span class="cw-type__h2 Ingredients-title">Ingredients</span><p>THIS IS THE TEXT I WANT TO EXTRACT</p>"""
soup = BeautifulSoup(html,'lxml')
print(soup.p.text)

Answer 2

假设可能还有更多的html，我将使用span和adjacent sibling combinator和p类型选择器的类来定位适当的p标签

from bs4 import BeautifulSoup as bs

html = '''
<span class="cw-type__h2 Ingredients-title">Ingredients</span>
<p>
                                THIS IS THE TEXT I WANT TO EXTRACT</p>
                                '''
soup = bs(html, 'lxml')
print(soup.select_one('.Ingredients-title + p').text.strip())

如何使用BeautifulSoup提取介于两者之间的代码？

2 个答案: