答案 0 :(得分:0)
from bs4 import BeautifulSoup
html = '''
<p>
<strong>123</strong>
A CTS PAC is nearing its expiration date
</p>
'''
soup = BeautifulSoup(html, 'html.parser')
p = soup.find('p')
text = list(p.children)[-1]
print(text.strip())
如果你有更多<p>
from bs4 import BeautifulSoup
html = '''
<p>
Other p-tag
</p>
<p>
<strong>123</strong>
A CTS PAC is nearing its expiration date
</p>
'''
soup = BeautifulSoup(html, 'html.parser')
all_p = soup.find_all('p')
text = list(all_p[1].children)[-1]
print(text.strip())
答案 1 :(得分:0)
我相信这个HTML来自cisco.com网站。如果是这样,这里就是您问题的直接答案。
>>> url = 'https://www.cisco.com/c/en/us/td/docs/security/asa/syslog/b_syslog/syslogs10.html'
>>> import bs4
>>> import requests
>>> page = requests.get(url).content
>>> soup = bs4.BeautifulSoup(page, 'lxml')
首先,我尝试寻找朴素的字符串。但是,经过对页面的仔细检查,我注意到了一些尾随空白。
>>> near = soup.find_all(string='A CTS PAC is nearing its expiration date')
>>> near
[]
使用正则表达式可以在源页面中搜索带有尾随空白的字符串。
>>> near = soup.find_all(string=bs4.re.compile('A CTS PAC is nearing its expiration date'))
>>> near
['A CTS PAC is nearing its expiration date.\n\t ']