我正在尝试通过Google地理编码API检索一堆地址的地理编码,然后将它们附加到带有地址的表格中。
在花了两天时间回顾互联网后,我找不到任何简单的做法,而不应该那么难。我特别在解析json输出时遇到问题并将其附加到原始表中。 我在windows上使用python 3.5
我最初从数据库中获取数据,我将数据添加到python中的数据框中。但要将其粘贴到此处,将其转换为字典并返回到数据框更容易:
data_dict={'street': {0: 'ROMULO', 1: 'SAN BARTOLOME', 2: 'GARBI', 3: 'SAN JOSE'},
'concat': {0: '3+ROMULO+CALLE+ALMERIA',
1: '5+SAN BARTOLOME+CALLE+TOLEDO',
2: '48+GARBI+CALLE+CASTELLON',
3: '30+SAN JOSE+CALLE+SANTA CRUZ DE TENERIFE'},
'number': {0: '3', 1: '5', 2: '48', 3: '30'},
'province': {0: 'ALMERIA',
1: 'TOLEDO',
2: 'CASTELLON',
3: 'SANTA CRUZ DE TENERIFE'},
'region': {0: 'ANDALUCIA',
1: 'CASTILLA LA MANCHA',
2: 'COMUNIDAD VALENCIANA',
3: 'CANARIAS'}}
返回数据帧:
import pandas as pd
table=pd.DataFrame.from_dict(data_dict)
现在我从Google地理编码API中检索数据:
import requests
import json
key="MyKey"
jsonout=[]
for i in table.loc[:,'concat']:
try:
url="https://maps.googleapis.com/maps/api/geocode/json?address=%s&key=%s" % (i, key)
response = requests.get(url)
jsonf = response.json()
jsonout.append(jsonf)
except Exception:
continue
我得到了这个输出:
jsonout=[{'results': [{'address_components': [{'long_name': '3',
'short_name': '3',
'types': ['street_number']},
{'long_name': 'Calle Rómulo',
'short_name': 'Calle Rómulo',
'types': ['route']},
{'long_name': 'Adra',
'short_name': 'Adra',
'types': ['locality', 'political']},
{'long_name': 'Almería',
'short_name': 'AL',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Andalucía',
'short_name': 'AL',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '04770', 'short_name': '04770', 'types': ['postal_code']}],
'formatted_address': 'Calle Rómulo, 3, 04770 Adra, Almería, Spain',
'geometry': {'location': {'lat': 36.7593, 'lng': -2.97818},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 36.76064898029149,
'lng': -2.976831019708498},
'southwest': {'lat': 36.7579510197085, 'lng': -2.979528980291502}}},
'partial_match': True,
'place_id': 'ChIJG39VNzNOcA0R2f8Ek3E12AY',
'types': ['street_address']}],
'status': 'OK'},
{'results': [{'address_components': [{'long_name': '5',
'short_name': '5',
'types': ['street_number']},
{'long_name': 'Calle de San Bartolomé',
'short_name': 'Calle de San Bartolomé',
'types': ['route']},
{'long_name': 'Toledo',
'short_name': 'Toledo',
'types': ['locality', 'political']},
{'long_name': 'Toledo',
'short_name': 'TO',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Castilla-La Mancha',
'short_name': 'CM',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '45002', 'short_name': '45002', 'types': ['postal_code']}],
'formatted_address': 'Calle de San Bartolomé, 5, 45002 Toledo, Spain',
'geometry': {'location': {'lat': 39.8549781, 'lng': -4.026267199999999},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 39.85632708029149,
'lng': -4.024918219708497},
'southwest': {'lat': 39.85362911970849, 'lng': -4.027616180291502}}},
'partial_match': True,
'place_id': 'ChIJ4bse1aALag0RJ5RxxfyDxUI',
'types': ['street_address']}],
'status': 'OK'},
{'results': [{'address_components': [{'long_name': '48',
'short_name': '48',
'types': ['street_number']},
{'long_name': 'Carrer de Garbí',
'short_name': 'Carrer de Garbí',
'types': ['route']},
{'long_name': 'Peníscola',
'short_name': 'Peníscola',
'types': ['locality', 'political']},
{'long_name': 'Castelló',
'short_name': 'Castelló',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Comunidad Valenciana',
'short_name': 'Comunidad Valenciana',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '12598', 'short_name': '12598', 'types': ['postal_code']}],
'formatted_address': 'Carrer de Garbí, 48, 12598 Peníscola, Castelló, Spain',
'geometry': {'location': {'lat': 40.3634529, 'lng': 0.3963583},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 40.3648018802915,
'lng': 0.397707280291502},
'southwest': {'lat': 40.3621039197085, 'lng': 0.395009319708498}}},
'partial_match': True,
'place_id': 'ChIJHVNHcelGoBIRogILRMno_wk',
'types': ['street_address']},
{'address_components': [{'long_name': '48',
'short_name': '48',
'types': ['street_number']},
{'long_name': 'Carrer Garbí',
'short_name': 'Carrer Garbí',
'types': ['route']},
{'long_name': 'Vila-real',
'short_name': 'Vila-real',
'types': ['locality', 'political']},
{'long_name': 'Castelló',
'short_name': 'Castelló',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Comunidad Valenciana',
'short_name': 'Comunidad Valenciana',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '12540', 'short_name': '12540', 'types': ['postal_code']}],
'formatted_address': 'Carrer Garbí, 48, 12540 Vila-real, Castelló, Spain',
'geometry': {'bounds': {'northeast': {'lat': 39.955829, 'lng': -0.110409},
'southwest': {'lat': 39.9558231, 'lng': -0.1104261}},
'location': {'lat': 39.9558231, 'lng': -0.110409},
'location_type': 'RANGE_INTERPOLATED',
'viewport': {'northeast': {'lat': 39.9571750302915,
'lng': -0.109068569708498},
'southwest': {'lat': 39.9544770697085, 'lng': -0.111766530291502}}},
'partial_match': True,
'place_id': 'EjRDYXJyZXIgR2FyYsOtLCA0OCwgMTI1NDAgVmlsYS1yZWFsLCBDYXN0ZWxsw7MsIFNwYWlu',
'types': ['street_address']}],
'status': 'OK'},
{'results': [{'address_components': [{'long_name': '30',
'short_name': '30',
'types': ['street_number']},
{'long_name': 'Calle San José',
'short_name': 'Calle San José',
'types': ['route']},
{'long_name': 'Santa Cruz de la Palma',
'short_name': 'Santa Cruz de la Palma',
'types': ['locality', 'political']},
{'long_name': 'Santa Cruz de Tenerife',
'short_name': 'TF',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Canarias',
'short_name': 'CN',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '38700', 'short_name': '38700', 'types': ['postal_code']}],
'formatted_address': 'Calle San José, 30, 38700 Santa Cruz de la Palma, Santa Cruz de Tenerife, Spain',
'geometry': {'location': {'lat': 28.6864347, 'lng': -17.7624433},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 28.6877836802915,
'lng': -17.7610943197085},
'southwest': {'lat': 28.6850857197085, 'lng': -17.7637922802915}}},
'partial_match': True,
'place_id': 'ChIJ8ZFx6__rawwRV3dc118gEgE',
'types': ['street_address']},
{'address_components': [{'long_name': '30',
'short_name': '30',
'types': ['street_number']},
{'long_name': 'Calle San José',
'short_name': 'Calle San José',
'types': ['route']},
{'long_name': 'San Andrés',
'short_name': 'San Andrés',
'types': ['locality', 'political']},
{'long_name': 'Santa Cruz de Tenerife',
'short_name': 'Santa Cruz de Tenerife',
'types': ['administrative_area_level_4', 'political']},
{'long_name': 'Santa Cruz de Tenerife',
'short_name': 'TF',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Canarias',
'short_name': 'CN',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '38120', 'short_name': '38120', 'types': ['postal_code']}],
'formatted_address': 'Calle San José, 30, 38120 San Andrés, Santa Cruz de Tenerife, Spain',
'geometry': {'location': {'lat': 28.505875, 'lng': -16.1930036},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 28.5072239802915,
'lng': -16.1916546197085},
'southwest': {'lat': 28.5045260197085, 'lng': -16.1943525802915}}},
'partial_match': True,
'place_id': 'ChIJsfd-ITjKQQwRjFHLI0XPSok',
'types': ['street_address']}],
'status': 'OK'}]
我最终想要的是具有lat和lng坐标的原始表格数据
(i['results'][0]['geometry']['location']['lat'],
i['results'][0]['geometry']['location']['lng'])
和请求中的formatted_address。
答案 0 :(得分:1)
我使用this package进行地理编码,负责解析JSON文件。
from geopy.geocoders import GoogleV3
googleGeo = GoogleV3('googleKey')
# create a geocoded list containing geocode objects
geocoded = []
for address in mydata['location']: # assumes mydata is a pandas df
geocoded.append(googleGeo.geocode(address)) # geocode function returns a geocoded object
# append geocoded list to mydata
mydata['geocoded'] = geocoded
# create coordinates column
mydata['coords'] = mydata['geocoded'].apply(lambda x: (x.latitude, x.longitude))
# if you want to split our your lat and long then do
# mydata['lat'] = mydata['geocoded'].apply(lambda x: x.latitude)
# mydata['long'] = mydata['geocoded'].apply(lambda x: x.longitude)
根据您分享的评论,如果您使用没有API密钥的Google API,那么在每个地理编码调用之间包含一个随机暂停可能会有所帮助。
from time import sleep
from random import randint
from geopy.geocoders import GoogleV3
googleGeo = GoogleV3()
def geocode(address):
location = googleGeo.geocode(address)
sleep(randint(5,10)) # give the API a break
return location
然后使用此自定义功能进行地理编码
在我之前的部分中捎带,您甚至可以使用多个地图API服务。这是我为我的一个项目构建的函数,首先使用Nominatim的API,然后如果Nominatim返回错误或者什么也不返回,则返回Google的API:
from geopy.geocoders import Nominatim, GoogleV3
from geopy.exc import GeocoderTimedOut, GeocoderAuthenticationFailure
from random import randint
from time import sleep
nomiGeo = Nominatim() # Nominatim geolocator
googleGeo = GoogleV3('myKey') # Google Maps v3 API geolocator
def geocode(address):
"""Geocode an address.
Args:
address (str): the physical address
Returns:
dict: geocoded object
"""
location = None
attempt = 0
useGoogle = False # set to True to use Google only
while (location is None) and (attempt <= 8):
try:
attempt += 1
if useGoogle:
location = googleGeo.geocode(address, timeout=10)
else:
location = nomiGeo.geocode(address, timeout=10)
if location is None:
useGoogle = True
location = googleGeo.geocode(address, timeout=10)
sleep(randint(5, 10)) # Give the API a break
except GeocoderAuthenticationFailure:
print 'Error: GeocoderAuthenticationFailure while geocoding {} during attempt #{}'.format(address, attempt)
if attempt % 2 == 0: # switch between services for every attempt
useGoogle = True
else:
useGoogle = False
sleep(60)
except GeocoderTimedOut:
sleep(randint(3, 5)) # Give API a break
print 'Error: GeocoderTimedOut while geocoding {} during attempt #{}'.format(address, attempt)
return location
请注意,我还导入了一些特定于包的异常,因为根据我对Nominatim的经验,它有时会抛出随机错误,这些是我得到的两个。此外,根据我对这两个API的经验,即使找不到某个地址,Google也似乎可以插入坐标,而Nominatim必须在他们的数据库中有地址才能返回。