Question

我使用以下代码获取以下内容

<p>Ibn Umar reported: I passed by the Messenger of Allah, peace and blessings be
 upon him, while my garment was trailing. The Prophet said, ÔÇ£<b>O Abdullah, ra
ise your garment</b>.ÔÇØ I lifted it up and he told me to raise it higher and I
did so. Some of the people said, ÔÇ£To where should it be raised?ÔÇØ The Prophet
 said, ÔÇ£<b>In the middle of the shins</b>.ÔÇØ</p>

我想知道你是否能够帮助我摆脱 <p>, </p> and <b>

代码：

url1 = "http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-the-lower-garment-should-be-hallway-between-the-shins/"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1) 
english_hadith = soup.findAll('p')[0]
print english_hadith

Answer 1

您可以使用nltk执行此操作。

示例：

from nltk import clean_html
html = "..."
clean_html(html)

Answer 2

我建议使用正则表达式而不是beautifulsoup.

>>> import re
>>> a='<p>dhhdhd<p>dhdhd</p>'
>>> re.sub('<p>|</p>','',a)
'dhhdhddhdhd'

更一般的正则表达式是

re.sub('<p[^>]*>|</p>','',a)

Answer 3

你很亲密。

print english_hadith.text

显示器：

Ibn Umar报告说：我经过安拉的使者，和平和祝福在他身上，而我的衣服正在落后。先知说，“阿卜杜拉，你的服装。”我举起它，他告诉我把它抬高，我这样做了。有些人说，“它应该在哪里被提升？”先知说，“在小腿的中间。”

Python使用beautifulsoup </p>删除字符串中的所有<p>实例

3 个答案: