Python BeautifulSoup - 排除find()标记并替换DIV

时间:2017-06-25 20:22:21

标签: python beautifulsoup

我有以下代码。它成功获取了我需要的内容,但也包括我正在搜​​索的标签。如何排除这个?

此外,内容是使用DIV标签而不是P标签。如何将所有<div>....</div>标记修改为<p>...</p>

示例输出:

<div class="article__body"><div>One</div><div>Two</div><div>Three</div></div>

所需输出

<p>One</p><p>Two</p><p>Three</p>

CODE:

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://example.com/news/1234")
soup = BeautifulSoup(html_page)


print soup.find("h2", {"class": "article__title"})
print ("=================================")
print soup.find("div", {"class": "article__body"})
print ("=================================")
print soup.find("div", {"class": "article__image"})

1 个答案:

答案 0 :(得分:0)

要将标签替换为其他标签,您可以使用replace_with功能。

DatabaseReference usersRef = MyDatabaseUtils.getUsersReference();
    usersRef.orderByChild("userId").equalTo(userIdValue).addListenerForSingleValueEvent(new ValueEventListener() {

        @Override
        public void onDataChange(DataSnapshot dataSnapshot) {
            // Get value
            if (dataSnapshot.getChildrenCount() > 0) {
                 // Do something
            } else {
                // Show error
            }
        }

        @Override
        public void onCancelled(final DatabaseError databaseError) {
           // Show error
        }
    });

输出:

from bs4 import BeautifulSoup

soup = BeautifulSoup("<div class=\"article__body\"><div>One</div><div>Two</div><div>Three</div></div>", "html5lib")

for div in soup.find_all('div', {'class' :  ''}):
    p = soup.new_tag('p')
    p.string = div.text
    div.replace_with(p)

for tag in soup.find_all(lambda x: x.name != 'p'):
    tag.unwrap()

print(soup)