如何使用beautifulsoup
提取以下内容逊尼派(传统),所有祈祷,包括正式的jum'a,印度尼西亚和+61 2 9591 1593
<div class="normalLink"><table cellpadding=0 cellspacing=0 border=0><tr><td rowspan="3"><img src="http://www.salatomatic.com/images/spacer.gif" width="7" border="0"></td><td></td><td rowspan="3"><img src="http://www.salatomatic.com/images/spacer.gif" width="10" border="0"></td></tr><tr><td><img src="http://www.salatomatic.com/images/spacer.gif" width="100" height="7"></td></tr><tr><td valign="top">
<b>Denomination:</b> Sunni (Traditional)<br>
<b>Demographics:</b> Predominantly Indonesian<br>
<b>Prayers:</b> All prayers including formal jum'a</br>
<b>Language of services:</b> Indonesian<br>
<b>Imam:</b> Unknown<br>
<b>Director/President:</b> Aly Zakaria<br>
<b>Phone:</b> +61 2 9591 1593<br>
<b>Website:</b> <a href='http://www.salatomatic.com/code/fn_web.php?id=5313' target=new>Click here</a> to visit website<br>
<b>Email:</b> <a href='http://www.salatomatic.com/de.php?id=5313'>Click here</a> to send email<br>
</td></tr></table>
</div>
到目前为止,我只能得到br的开头
CODE :
from bs4 import BeautifulSoup
import urllib2
url1 = "http://www.salatomatic.com/c/Sydney+168"
content1 = urllib2.urlopen(url1).read() soup1 content1 = urllib2.urlopen(currenturl).read()
soup1 = BeautifulSoup(content1)
div = soup1.find('div', {'class':'normalLink'})
b = div.find('b')
print b
答案 0 :(得分:1)
这应该有帮助..
info_list = soup.get_text().split('\n')
for i in info_list:
print i
Denomination: Sunni (Traditional)
Demographics: Predominantly Indonesian
Prayers: All prayers including formal jum a
Language of services: Indonesian
Imam: Unknown
Director/President: Aly Zakaria
Phone: +61 2 9591 1593
Website: Click here to visit website
Email: Click here to send email