在熊猫中使用Geocoder API查找位置

时间:2019-03-17 09:55:02

标签: python pandas google-geocoder geocoder

我在熊猫数据框中有一系列城市名称。为此,我需要找出特定城市的地址,并将其存储在同一数据框中的单独列中。 City列也包含NaN值。我将分别获得给定位置/城市名称的地址。但这不适用于熊猫数据框

data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for i in df.Name:
    if i == "NaN":
       continue
    loc = geolocator.geocode(i)
address = loc.address
print(address)

它适用于数据框,但仅返回最后一个地址,而不是整个3个城市。如果我们按如下所示更改顺序,

data = [['Nan',10],['Madurai',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])

我收到错误消息:GeocoderTimedOut:服务超时

查询:  1.我希望将结果(地址)保存在列中  2.如何处理Nan值

3 个答案:

答案 0 :(得分:0)

您可以通过以下方式添加带有地址的列:

import pandas as pd
data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for i in df.Name:
    if i == "NaN":
        continue
    df.loc[df.Name == i, 'Address'] = geolocator.geocode(i)

print(df)

答案 1 :(得分:0)

您只会得到最后一个值,因为您每次在循环中都会不断替换loc。 发生GeocoderTimedOut: Service timed out错误是因为您要向服务器发出许多请求。您应该在请求之间包含sleep。如果仍然出现此错误,请查看以下内容:Link - Avoid time out

尝试:

import pandas as pd
from geopy.geocoders import Nominatim
import time

data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
geolocator = Nominatim(user_agent='test')
address = []
for i in df.Name:
    time.sleep(3)
    if i == "NaN":
       address.append('NaN')
       continue    
    address.append(geolocator.geocode(i))

df['address'] = address

答案 2 :(得分:0)

I introduced timedelay between requests as below and few lines to view progress bar

from geopy.geocoders import Nominatim
geolocator = Nominatim()
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
final['Geolocation'] = final['city'].apply(geocode)
from tqdm import tqdm
tqdm.pandas()
final['Geolocation'] = final['city'].progress_apply(geocode)

It works now.