美丽的汤:提取标签之间的所有数据

时间:2018-04-05 08:56:22

标签: python html beautifulsoup tags

<p>
 <strong>
  <em>
   Insurtech
  </em>
 </strong>
</p>
<p> .....Some data </p>
<p>
 <strong>
  <em>
   Biometrics
  </em>
 </strong>
</p>

我试过这个:     html_tags = soup.find_all(&#39; em&#39;)     对于范围内的i(len(html_tags)-1):      start_tag = html_tags [i]      end_tag = html_tags [i + 1]      between_tag =(soup_str.split(str(start_tag)))[1] .split(str(end_tag))[0]      soup1 = BeautifulSoup(between_tag,&#39; html.parser&#39;) 我想要从p->strong->em到下一个p->strong->em标记的所有数据。这是我的示例数据。提前谢谢**

2 个答案:

答案 0 :(得分:2)

s = '''<p>
 <strong>
  <em>
   Insurtech
  </em>
 </strong>
</p>
<p> .....Some data </p>
<p>
 <strong>
  <em>
   Biometrics
  </em>
 </strong>
</p>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

>>> list(soup.stripped_strings)
['Insurtech', '.....Some data', 'Biometrics']

答案 1 :(得分:0)

您可以使用.text方法访问所需信息。

<强>实施例

s = """<p>
 <strong>
  <em>
   Insurtech
  </em>
 </strong>
</p>
<p> .....Some data </p>
<p>
 <strong>
  <em>
   Biometrics
  </em>
 </strong>
</p>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(s, "html.parser")
html_tags = soup.find_all('p')
for h in html_tags:
    print(h.text.strip())     #-->Update.

<强>输出:

Insurtech
.....Some data
Biometrics