我希望嵌套div中的数据无法获取。
有嵌套div我需要正确格式化数据。
我写了bs4模块,但是我收到了错误
BeautifulSoup:AttributeError:' NavigableString'对象没有属性' name'
请帮助我!
我的HTML
ConnectivityManager ConnectionManager = (ConnectivityManager) getSystemService(Context.CONNECTIVITY_SERVICE);
NetworkInfo networkInfo = ConnectionManager.getActiveNetworkInfo();
if (networkInfo != null && networkInfo.isConnected() == true) {
//Internet Connected
} else {
//Internet Disconnected
}
我美丽的汤代码
<div id="new">
<div id="newDat">
<div class="Data">
<div class="DataNew">
<div class="DataNew new">
<div class="Data Left">
<div class="name"><a class="name" href="">Jack Daniels</a></div>
<div class="details"><span class="loc">Barcelona</span></div>
<div class="header"><a class="looking"> Looking for meeting new people</a></div>
<div class="ideas"><a class="ideas">I have new ideas</a></div>
<div class="profile"> <em class="profilss"></em>MS in cs<br></div>
</div>
<div class="Data Right">
<a class="phone"><span class="txt">+123123123123123231</span></a>
</div>
</div>
</div>
</div>
<div class="DataOne">
<div class="DataNew">
<div class="DataNew new">
<div class="Data Left">
<div class="name"><a class="name" href="">Jack Daniels</a></div>
<div class="details"><span class="loc">Barcelona</span></div>
<div class="header"><a class="looking"> Looking for meeting new people</a></div>
<div class="ideas"><a class="ideas">I have new ideas</a></div>
<div class="profile"> <em class="profilss"></em>MS in cs<br></div>
</div>
<div class="Data Right">
<a class="phone"><span class="txt">+123123123123123231</span></a>
</div>
</div>
</div>
</div>
<div class="DataTwo">
<div class="DataNew">
<div class="DataNew new">
<div class="Data Left">
<div class="name"><a class="name" href="">Jack Daniels</a></div>
<div class="details"><span class="loc">Barcelona</span></div>
<div class="header"><a class="looking"> Looking for meeting new people</a></div>
<div class="ideas"><a class="ideas">I have new ideas</a></div>
<div class="profile"> <em class="profilss"></em>MS in cs<br></div>
</div>
<div class="Data Right">
<a class="phone"><span class="txt">+123123123123123231</span></a>
</div>
</div>
</div>
</div>
<div class="DataThree">
<div class="DataNew">
<div class="DataNew new">
<div class="Data Left">
<div class="name"><a class="name" href="">Jack Daniels</a></div>
<div class="details"><span class="loc">Barcelona</span></div>
<div class="header"><a class="looking"> Looking for meeting new people</a></div>
<div class="ideas"><a class="ideas">I have new ideas</a></div>
<div class="profile"> <em class="profilss"></em>MS in cs<br></div>
</div>
<div class="Data Right">
<a class="phone"><span class="txt">+123123123123123231</span></a>
</div>
</div>
</div>
</div>
</div>
</div>
我想要像这样的输出
li = page.find('div', {'id': 'new'})
for tag in li:
for i in tag.find_all("div", {"class": "name"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "details"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "header"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "ideas"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "profile"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "phone"}):
print i.getText()
break
等等。
如果Div one
Name : Jack Daniels
Details : Barcelona
header : Looking for meeting new people
ideas : I have new ideas
profile: MS in cs
tel : +123123123123123231
Div two
Name : Jack Daniels
Details : Barcelona
header : Looking for meeting new people
ideas : I have new ideas
profile: MS in cs
tel : +123123123123123231
内有100个Div,我需要这样的输出。
答案 0 :(得分:1)
你可以这样做。这将返回每个div的数据。
from bs4 import BeautifulSoup
soup = BeautifulSoup(b) // b is html
rows =soup.find_all('div', {'class': 'DataNew'})
for tag in rows:
for tag in li:
for i in tag.find_all("div", {"class": "name"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "details"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "header"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "ideas"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "profile"}):
print i.getText()
break
for i in tag.find_all("div", {"class": "Data Right"}):
print i.getText()
break