在我以前的一篇文章中,我能够检索到所有p标签
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.centralpark.com/things-to-do/central-park-zoo/polar-bears/'
# opening up connection
uClient = uReq(my_url)
page_html = uClient.read()
# close connection
uClient.close()
page_soup = soup(page_html, features="html.parser")
ps=list(page_soup.find_all('p'))
for s in ps:
print(s)
我想要的是检索那些p标签内的任何内容。 例如:
ex1='<p> this is example </p>' -> I want res1 = 'this is example'
ex2='<p> this is <strong> nice </strong> example </p>' -> I want res2 = 'this is nice example'
ex3='<p> this is <b> okeyish </b> example </p>' -> I want res3 = 'this is okeyish example'
所有结果(res1,res2,res3)都可以转到列表。
我一直在寻找解决方案,但是解决方案建议仅适用于一种类型的标签example。我想要的只是检索介于p和/ p之间的所有内容,而不管介于两者之间的其他标签是什么。如果其他标签包含内容,则也应包含这些内容。
答案 0 :(得分:1)
ps=page_soup.find_all('p')
results = []
for s in ps:
#print(s.text)
results = results.append(s.text)