我如何刮一个跟随另一个段落的段落匹配字符串?

时间:2016-05-07 11:36:28

标签: python web-scraping beautifulsoup

我想抓一个带有特定文字"Interested String ZZZ"

的另一个段落的段落

例如:

<p align="center"><strong><span style="text-decoration: underline;">Interested String ZZZ</span></strong></p>
<p style="text-align: justify;"><span style="font-size: small;">This is the paragraph string that i want to scrape out</span></p>

我如何在python中做到这一点?

1 个答案:

答案 0 :(得分:0)

使用text参数来匹配元素的文本内容,然后使用find_next_sibling()获取下一个<p>兄弟元素:

>>> from bs4 import BeautifulSoup
>>> raw = '''<div>
... <p align="center"><strong><span style="text-decoration: underline;">Interested String ZZZ</span></strong></p>
... <p style="text-align: justify;"><span style="font-size: small;">This is the paragraph string that i want to scrape out</span></p>
... </div>'''
... 
>>> soup = BeautifulSoup(raw, "lxml")
>>> [s.find_next_sibling("p").string for s in soup("p", text="Interested String ZZZ")]
[u'This is the paragraph string that i want to scrape out']