url = 'http://www.zillow.com/homedetails/3728-Balcary-Bay-Champaign-IL-61822/89057727_zpid/'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
info = soup.findAll('span',{'itemtype':'http://schema.org/GeoCoordinates'}) #this tag + class combination found 4 matches, 4th one was the required one, just selecting that here
for form in info:
b= form.find('meta')['content']
print b
这是我用来从Zillow获取纬度和经度信息的代码的快照。我可以使用span和itemtype精确定位存储纬度和经度信息的代码。 我正在解析此数据的地方有一个类似于下面的代码:
<span itemprop="geo" itemscope="" itemtype="http://schema.org/GeoCoordinates">
<meta content="40.12938" itemprop="latitude">
<meta content="-88.30766" itemprop="longitude">
</span>
我可以获取纬度信息但无法获取经度信息。有人可以帮助我获取这些信息吗?
代码输出:
>>> ================================ RESTART ================================
>>>
40.12938
>>>
预期输出:
>>> ================================ RESTART ================================
>>>
40.12938 -88.30766
>>>
答案 0 :(得分:1)
form.find()
找到第一个结果<meta content="40.12938" itemprop="latitude">
但是使用form.find_all()
方法返回所有结果,然后您可以使用列表推导将它们添加到列表中,如图所示下面:
url = 'http://www.zillow.com/homedetails/3728-Balcary-Bay-Champaign-IL-61822/89057727_zpid/'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
info = soup.findAll('span',{'itemtype':'http://schema.org/GeoCoordinates'}) #this tag + class combination found 4 matches, 4th one was the required one, just selecting that here
cordinates = [i['content'] for i in info[0].find_all('meta')]
print cordinates
它会产生:
[u'40.12938', u'-88.30766']