在查找纬度和经度之间的距离时维护标识符

时间:2019-03-19 18:35:49

标签: python python-3.x pandas geopy

我有两组纬度和经度,希望通过笛卡尔联接来联接,并找出每对之间的距离。在numberother_number(即每个标识符两个位置/地址)中可以重复

d = {'number': ['100', '101'], 'lat': ['40.6892', '41.8902'], 'long': ['74.0445','12.4922']}
d2 = {'other_number': ['200', '201'], 'lat': ['37.8199', '43.8791'], 'long': ['122.4783','103.4591']}
data = pd.DataFrame(data=d)
data2 = pd.DataFrame(data=d2)

我目前正在将经/纬度字段转换为元组列表...

tuple_list_1 = list(zip(data.lat.astype(float), data.long.astype(float)))
tuple_list_2 = list(zip(data2.lat.astype(float), data2.long.astype(float)))

...然后使用生成器执行笛卡尔连接。

gen = ([x, y] for x in tuple_list_1 for y in tuple_list_2)

最后,我通过一个简单的循环找到距离:

from geopy.distance import geodesic

for u, v in gen:
    dist = geodesic(u, v).miles
    print(dist)

最终,我希望将距离绑定回原始信息(即numberother_number)。这是我想要的结果:

    d3 = {'number': ['100', '100','100','100'], 
     'address': ['Statue of Liberty', 'Statue of Liberty', 'Colosseum', 'Colosseum'],
     'other_number': ['200', '200', '201', '201'],
     'other_address': ['Golden Gate Bridge','Mount Rushmore','Golden Gate Bridge','Mount Rushmore'],
     'distance':[2572.262967759492,1515.3455804766047,5400.249562015358,4365.4386483486205]
    }
data3 = pd.DataFrame(data=d3)

如何有效地检索距离(我认为遍历生成器的效率可能不高),然后将结果绑定到最终DataFrame中的标识字段?

1 个答案:

答案 0 :(得分:1)

import pandas as pd

d = {'number': ['100', '101'], 'lat': ['40.6892', '41.8902'], 'long': ['74.0445','12.4922']}
d2 = {'other_number': ['200', '201'], 'lat': ['37.8199', '43.8791'], 'long': ['122.4783','103.4591']}
data = pd.DataFrame(data=d)
data2 = pd.DataFrame(data=d2)

# Perform cartesian product
data['key'] = 0
data2['key'] = 0
df = pd.merge(data, data2, on='key', how='outer')
df = df.drop('key', axis=1)

# Calculate distance
from geopy.distance import geodesic
df['distance'] = df.apply(lambda row: geodesic((row['lat_x'], row['long_x']), (row['lat_y'], row['long_y'])).miles, axis=1)

df看起来像这样:

  number    lat_x   long_x other_number    lat_y    long_y     distance
0    100  40.6892  74.0445          200  37.8199  122.4783  2572.262968
1    100  40.6892  74.0445          201  43.8791  103.4591  1515.345580
2    101  41.8902  12.4922          200  37.8199  122.4783  5400.249562
3    101  41.8902  12.4922          201  43.8791  103.4591  4365.438648

如果您不喜欢通过新的key列来使用大熊猫中的笛卡尔积,还有其他方法,请参见cartesian product in pandas