我有以下代码。它成功获取了我需要的内容,但也包括我正在搜索的标签。如何排除这个?
此外,内容是使用DIV标签而不是P标签。如何将所有<div>....</div>
标记修改为<p>...</p>
示例输出:
<div class="article__body"><div>One</div><div>Two</div><div>Three</div></div>
所需输出
<p>One</p><p>Two</p><p>Three</p>
CODE:
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://example.com/news/1234")
soup = BeautifulSoup(html_page)
print soup.find("h2", {"class": "article__title"})
print ("=================================")
print soup.find("div", {"class": "article__body"})
print ("=================================")
print soup.find("div", {"class": "article__image"})
答案 0 :(得分:0)
要将标签替换为其他标签,您可以使用replace_with
功能。
DatabaseReference usersRef = MyDatabaseUtils.getUsersReference();
usersRef.orderByChild("userId").equalTo(userIdValue).addListenerForSingleValueEvent(new ValueEventListener() {
@Override
public void onDataChange(DataSnapshot dataSnapshot) {
// Get value
if (dataSnapshot.getChildrenCount() > 0) {
// Do something
} else {
// Show error
}
}
@Override
public void onCancelled(final DatabaseError databaseError) {
// Show error
}
});
输出:
from bs4 import BeautifulSoup
soup = BeautifulSoup("<div class=\"article__body\"><div>One</div><div>Two</div><div>Three</div></div>", "html5lib")
for div in soup.find_all('div', {'class' : ''}):
p = soup.new_tag('p')
p.string = div.text
div.replace_with(p)
for tag in soup.find_all(lambda x: x.name != 'p'):
tag.unwrap()
print(soup)