.replace没有在美丽的汤工作

时间:2013-08-27 11:47:09

标签: python replace beautifulsoup

这只是代码的一部分。它不会替换div和href的值。这是一个美丽的汤类标签

soup = BeautifulSoup(ourUrl)
dem = soup.findAll('p')
for i in range(0,len(dem)-1):
              dk = dem[i]


              if ('<div') in dk:
                   print "it here"
                   dk =dk.replace('<div','<!--')
                   dk =dk.replace('</div>','--->')
                   dem[i] = dk
for i in range(0,len(dem)-1):
              dk = dem[i]
              if ('<a href') in dk:
                   print "it here"
                   dk =dk.replace('<a href','<!--')
                   dk =dk.replace('</a>','--->')
                   dem[i] = dk

dem值类似于:

dem =[    <p class="left-text padding-left-10">
<a href="/people" class="red-text">See all people</a>
</p>
<p class="left-text padding-left-10">
<a href="/tv" class="red-text" style="display:inline;">See all bio TV</a>
<span class="divider">&nbsp;|&nbsp;&nbsp;</span>
<a href="/tv/daily-schedule" class="red-text" style="display:inline;">See schedule </a>
</p>
<p class="left-text bottom-flyout-video-padding">
<a href="/videos" class="red-text ">See all videos</a>
</p>
<p class="left-text padding-left-10">
<a href="http://shop.history.com/?v=biography" class="red-text">Shop now</a>
</p>
<p>TV14 </p>
<p>He rose from the slums of Brooklyn to take on the biggest Mafia dons of the 1950s and 1960s. Joey Gallo began his criminal career as a small-time loan shark and jukebox racketeer. He became a top enforcer in the Profaci crime family, but felt he never got the respect he deserved. So Gallo formed his own gang and revolted against mafia Don Joe Profaci in a long, bloody war on the streets of New York. But there was another side to Joey Gallo--the ruthless mob leader was also an artist and an avid reader. Living in Greenwich Village with his wife Jeffie, Gallo was inspired by his beatnik neighbors and their counterculture ideas. He also began hobnobbing with New York's social elite, befriending everyone from Neil Simon to Jerry Orbach. In the end though, nothing could save Joey Gallo from a dramatic end.</p>
<p>TV14 </p>
<p>
<p> Charles Darwin, <a href="/people/charles-darwin-9266433">http://www.biography.com/people/charles-darwin-9266433</a> (last visited Aug 27, 2013).</p>
<p> Charles Darwin. The Biography Channel website. 2013. Available at: <a href="/people/charles-darwin-9266433">http://www.biography.com/people/charles-darwin-9266433</a>. Accessed Aug 27, 2013. </p>
<p>Naturalist Charles Darwin was born in Shrewsbury, England, on February 12, 1809. In 1831, he embarked on a five-year survey voyage around the world on the HMS <i>Beagle</i>. His studies of specimens around the globe led him to formulate his theory of evolution and his views on the process of natural selection. In 1859, he published <i>On the Origin of Species</i>. He died on April 19, 1882, in London.</p>
<p><span class="body">A man who dares to waste one hour of time has not discovered the value of life.</span></p>


                            571 people in this group<br />
</p>]

dem值太大而无法输入,所以我给了你一个提取物。即使有

1 个答案:

答案 0 :(得分:0)

如果要用包含替换标记的注释替换元素,请用新的bs4.Comment()对象替换该对象:

from bs4 import Comment

for para in soup.find_all('p'):
    for div in para.find_all('div'):
        div.replace_with(Comment(unicode(div)))
    for link in para.find_all('a', href=True):
        link.replace_with(Comment(unicode(link)))

在Python中,不是使用for循环range(),而是直接循环序列;在上面的代码中,我直接遍历.find_all()结果。

BeautifulSoup元素可以打印为好像它们只是HTML文本,但实际上它们不是字符串而是Tag() objects。不要试图将它们视为字符串。