如何使用python中的Beautifulsoup从以下HTML代码中提取所有<p>
,请参阅下面的代码我正在尝试
HTML代码:
<div class="text_details">
<p>
Allah's Messenger (ﷺ) said: Islam is based on (the following) five (principles):
</p>
<p> 1. To testify that none has the right to be worshipped but Allah and Muhammad is Allah's Messenger (ﷺ).</p>
<p> 2. To offer the (compulsory congregational) prayers dutifully and perfectly.</p>
<p> 3. To pay Zakat (i.e. obligatory charity)</p>
<p> 4. To perform Hajj. (i.e. Pilgrimage to Mecca)</p>
<p> 5. To observe fast during the month of Ramadan.</p>
<p></p>
</div>
代码:
import requests
from bs4 import BeautifulSoup
url = "https://www.sunnah.com/bukhari/11"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
for pp in soup.find_all(class_='text_details').p:
print pp.text
答案 0 :(得分:0)
您应该find
(返回一个标记)div
标记,然后find_all
(返回标记列表)p
标记
In [59]: for pp in soup.find(class_='text_details').find_all('p'):
...: print(pp.text)
...:
I heard Allah's Messenger (ﷺ) (p.b.u.h) saying, "We (Muslims) are the last (to come) but (will be) the
foremost on the Day of Resurrection though the former nations were given the Holy Scriptures before
us. And this was their day (Friday) the celebration of which was made compulsory for them but they
differed about it. So Allah gave us the guidance for it (Friday) and all the other people are behind us in
this respect: the Jews' (holy day is) tomorrow (i.e. Saturday) and the Christians' (is) the day after
tomorrow (i.e. Sunday)."
div标记仅包含p
标记,因此您可以通过以下方式获取所有文本:
In [60]: soup.find(class_='text_details').text
答案 1 :(得分:0)
您可以使用select
来抓取p
作为父母的所有子text_details
。
像这样:
import requests
from bs4 import BeautifulSoup
url = "https://www.sunnah.com/bukhari/11"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
for pp in soup.select("div.text_details > p"):
print (pp.text)