Question

url = 'http://www.zillow.com/homedetails/3728-Balcary-Bay-Champaign-IL-61822/89057727_zpid/'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

info = soup.findAll('span',{'itemtype':'http://schema.org/GeoCoordinates'}) #this tag + class combination found 4 matches, 4th one was the required one, just selecting that here
for form in info:
        b= form.find('meta')['content']
print b

这是我用来从Zillow获取纬度和经度信息的代码的快照。我可以使用span和itemtype精确定位存储纬度和经度信息的代码。我正在解析此数据的地方有一个类似于下面的代码：

<span itemprop="geo" itemscope="" itemtype="http://schema.org/GeoCoordinates">
<meta content="40.12938" itemprop="latitude">
<meta content="-88.30766" itemprop="longitude">
</span>

我可以获取纬度信息但无法获取经度信息。有人可以帮助我获取这些信息吗？

代码输出：

>>> ================================ RESTART ================================
>>> 
40.12938
>>>

预期输出：

>>> ================================ RESTART ================================
>>> 
40.12938 -88.30766
>>>

Answer 1

form.find()找到第一个结果<meta content="40.12938" itemprop="latitude">但是使用form.find_all()方法返回所有结果，然后您可以使用列表推导将它们添加到列表中，如图所示下面：

url = 'http://www.zillow.com/homedetails/3728-Balcary-Bay-Champaign-IL-61822/89057727_zpid/'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

info = soup.findAll('span',{'itemtype':'http://schema.org/GeoCoordinates'}) #this tag + class combination found 4 matches, 4th one was the required one, just selecting that here
cordinates = [i['content'] for i in info[0].find_all('meta')]

print cordinates

它会产生：

[u'40.12938', u'-88.30766']

使用beautifulsoup解析HTML类元素的问题

1 个答案: