使用BeautifulSoup for Python获取地址

时间:2013-12-03 11:47:18

标签: python beautifulsoup scrape

我在从以下网站抓取地址时遇到困难,请帮我刮一下地址。

http://www.salatomatic.com/d/Revesby+17154+Ahlus-Sunnah-Wal-Jamaah-Revesby

上面的网络链接的源代码如下

<td width="100%"><div class="titleBM">Bankstown Masjid </div>Meredith Street, Bankstown, New South Wales 2200</td>

我试图在</div>

之后立即抓取价值

我当前的代码没有完成,但看起来像是

content1 = urllib2.urlopen(url1).read()
soup1 = BeautifulSoup(content1)
div1 = soup1.find('div', {'class':'titleBM'}) #get the div where it's located
span1 = div1.find('</div>')
pos1 = span1.text       

print datetime.datetime.now(), 'street address:  ' , pos1)

2 个答案:

答案 0 :(得分:1)

该文字是<div>元素的下一个兄弟,因此请使用next_sibling

from bs4 import BeautifulSoup
import urllib2
import datetime

url1 = 'http://www.salatomatic.com/d/Revesby+17154+Ahlus-Sunnah-Wal-Jamaah-Revesby'

content1 = urllib2.urlopen(url1).read()
soup1 = BeautifulSoup(content1)
div1 = soup1.find('div', {'class':'titleBM'}) #get the div where it's located
pos1 = div1.next_sibling

print datetime.datetime.now(), 'street address:  ' , pos1

像以下一样运行:

python2 script.py

它产生:

2013-12-03 12:55:41.306271 street address:   9-11 Mavis Street, Revesby, New South Wales 2212

答案 1 :(得分:0)

由于JavaScript的原因,你应该使用selenium webdriver来解决这个问题:

from selenium.webdriver import Firefox

在此处查找更多内容 Link