Question

假设我有一个HTML代码段，我只想从直接级别get_text开始：

from bs4 import BeautifulSoup
s = "<div><p><strong>College Type:</strong> \r\nPrivate Un-aided\r\n</p></div>"
soup = BeautifulSoup(s, 'lxml')
print soup.find('p').get_text()

打印哪些：

College Type: 
Private Un-aided

但我只想：

Private Un-aided

在即时<p>标记中 - 忽略子标记<strong>中的文字。

Answer 1

您可以在＆lt; p＆gt;内搜索文本内容的标记，以及您不想递归到子标记的状态：

>>> print soup.find('p').find(text=True, recursive=False)

Private Un-aided

Python`bs4.BeautifulSoup.get_text（）` - 仅从直接级别获取文本

1 个答案: