只从标签BeautifulSoup Python获取直接文本

时间:2018-02-13 12:39:38

标签: python html python-3.x beautifulsoup

我正在解析一个HTML文档,我希望得到特定的标签,并将它们分别用于其他标签,但我正在查找标签内的标签等问题。有人可以建议如何只获取

标签内容而不包括标签内容?

<p> I want this text <b> I want to parse this separately </b> I also want this text </p>

1 个答案:

答案 0 :(得分:1)

您可以使用NavigableString

from bs4 import BeautifulSoup, NavigableString

html = '''<p> I want this text1 <b> I want to parse this separately1 </b> I also want this text1 </p>
<p> I want this text2 <b> I want to parse this separately2 </b> I also want this text2 </p>'''
soup = BeautifulSoup(html, 'html.parser')
for p in soup.find_all('p'):
    outer_text = ' '.join([x.strip() for x in p if isinstance(x, NavigableString)])
    print(outer_text)
    inner_text = p.b.text.strip()
    print(inner_text)

输出:

  

我想要这个文字1我也想要这个文字1   我想单独解析这个1   我想要这个文本2我也想要这个文本2   我想单独解析这个