从数据框熊猫计算经纬度数据的距离

时间:2019-03-18 18:34:41

标签: python-3.x pandas latitude-longitude

我是熊猫的初学者。 我有一个以场所ID及其纬度和经度作为列的DataFrame,我需要制作一个单独的Dataframe来查找每个场所之间的距离。有38333个场地,运行38333 * 38333循环似乎不切实际。谁能给我一个更好的解决方案? dataframe snapshot

1 个答案:

答案 0 :(得分:0)

如果您想举一个例子,可以做些什么:

def haversine_np(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat / 2.0) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0) ** 2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

# =========== just to create random lat and long
from random import uniform
def newpoint():#long,lat
    return uniform(-180, 180), uniform(-90, 90)

n=5         #choose the number of random points
points = (newpoint() for x in range(n))
lon = [x for x,y in points]
points = (newpoint() for x in range(n))
lat = [y for x,y in points]
id = [x for x in range(n)]
df = pd.DataFrame({'id': id, 'Latitude': lat, 'Longitude': lon})
print(df)

df示例的输出:

   id   Latitude   Longitude
0   0  30.052750  -35.294843
1   1  60.588742 -124.559868
2   2 -23.872878  -21.469725
3   3 -67.234086  -95.865194
4   4 -26.889749 -179.668853

def distance_ids(orig, dest):
    return dist[np.abs(orig - dest)][np.amin([orig, dest])]

lat = df['Latitude'].values;lon = df['Longitude'].values

    # if enough mem, you could calculate the distances between all points
dist=[]
for index  in range(len(lat)):
    d = haversine_np(np.roll(lon, -index), np.roll(lat, -index), lon, lat)
    # you could include the result in dataframe
    df[f'0 to {index}'] = pd.Series(dist)
    # or you could append the result in big array
    dist.append(d)
    # in this case, you could trap the distance between 2 ids
    # with the function: distance_ids(3, 4) for example

# you could just calculate the distances between one id and all others ids
#for id = 2 for example,
index = 2
lat1 = np.repeat(lat[2], len(lat))
lon1 = np.repeat(lon[2], len(lat))
#dist_index contains an array of all distances from id 2 to all others ids
dist_index = haversine_np(lat1, lon1, lon, lat)