如何从
下面的字符串中删除以下(<span class=saws></span>)
<p>In the house of Um-Salama I saw Allah's Messenger (<span class=saws></span>) offering prayers, wrapped in a single garment
around his body with its ends crossed round his shoulders.</b></div>
我已尝试过所有内容,我设法移除<span class=saws></span>
,但我现在无法摆脱()
代码:
url = "http://www.sunnah.com/bukhari/8"
parser = etree.HTMLParser()
html = etree.parse(url, parser)
result = etree.tostring(html.getroot(), pretty_print=True, method="html")
soup = BeautifulSoup(result)
results = soup.findAll("div", {"class" : "actualHadithContainer"})
for result in results :
en = re.sub('</span>|<div class="text_details">|</div>|</p>|<p>|<span class=|[??]|("saws">)','',str(result.find("div", {"class" : "text_details"})))
en1 = re.sub('()','',str(en))
print en1
ar1 = re.sub('<span class="arabic_sanad arabic">|</span>','',str(result.find("span", {"class" : "arabic_sanad arabic"})))
ar2 = re.sub('<span class="arabic_text_details arabic">|</span>|<span class="arabic_text_details arabic">','',str(result.find("span", {"class" : "arabic_text_details arabic"})))
print ar1 + ar2
答案 0 :(得分:2)
像
这样简单的事情 (\(<span\sclass\=saws\>.*</span>\))
这将删除整个(<span class=saws></span>)
请参阅http://regex101.com/r/uL3fV4了解实时演示
答案 1 :(得分:0)
我与BeatufulSoup的例子:
soup = BeautifulSoup(u"""<p>In the house of Um-Salama I saw Allah's
Messenger (<span class=saws></span>) offering prayers, wrapped in a single garment
around his body with its ends crossed round his shoulders.</b></div>""")
results = soup.findAll()
for tag in results:
if tag.name == 'span' and 'saws' in tag.attrs.get('class', []):
tag.extract()
print re.sub(ur'\(\)', u'', unicode(soup))
答案 2 :(得分:0)
#! /usr/bin/env python
from bs4 import BeautifulSoup
import urllib2
import lxml
from lxml import etree
import re
url = "http://www.sunnah.com/bukhari/8"
parser = etree.HTMLParser()
html = etree.parse(url, parser)
result = etree.tostring(html.getroot(), pretty_print=True, method="html")
# content1 = urllib2.urlopen(url).read()
soup = BeautifulSoup(result)
results = soup.findAll("div", {"class" : "actualHadithContainer"})
for result in results :
en = re.sub('</span>|<div class="text_details">|</div>|</p>|<p>|[??]|\(<span class="saws"></span>\)|<b>|</b>','',str(result.find("div", {"class" : "text_details"})))
print en
ar1 = re.sub('<span class="arabic_sanad arabic">|</span>','',str(result.find("span", {"class" : "arabic_sanad arabic"})))
ar2 = re.sub('<span class="arabic_text_details arabic">|</span>|<span class="arabic_text_details arabic">','',str(result.find("span", {"class" : "arabic_text_details arabic"})))
print ar1 + ar2