我有两个数据帧: 1)包含供应商列表及其Lat,Long坐标
sup_essential = pd.DataFrame({'supplier': ['A','B','C'],
'coords': [(51.1235,-0.3453),(52.1245,-0.3423),(53.1235,-1.4553)]})
2)商店列表及其纬度,长坐标
stores_essential = pd.DataFrame({'storekey': [1,2,3],
'coords': [(54.1235,-0.6553),(49.1245,-1.3423),(50.1235,-1.8553)]})
我想创建一个输出表,其中包含:store,store_coordinates,supplier,supplier_coordinates,每个商店和供应商组合的距离。
我目前有:
test=[]
for row in sup_essential.iterrows():
for row in stores_essential.iterrows():
r = sup_essential['supplier'],stores_essential['storeKey']
test.append(r)
但这只是重复了所有的值
答案 0 :(得分:0)
来源DF
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
m = pd.merge(sup.assign(x=0), stores.assign(x=0), on='x', suffixes=['1','2']).drop('x',1)
d1 = sup[['coords']].assign(lat=sup.coords.str[0], lon=sup.coords.str[1]).drop('coords',1)
d2 = stores[['coords']].assign(lat=stores.coords.str[0], lon=stores.coords.str[1]).drop('coords',1)
m['dist_km'] = np.ravel(dist.pairwise(np.radians(d1), np.radians(d2)) * 6367)
## -- End pasted text --
<强>解决方案:强>
In [135]: m
Out[135]:
coords1 supplier coords2 storekey dist_km
0 (51.1235, -0.3453) A (54.1235, -0.6553) 1 334.029670
1 (51.1235, -0.3453) A (49.1245, -1.3423) 2 233.213416
2 (51.1235, -0.3453) A (50.1235, -1.8553) 3 153.880680
3 (52.1245, -0.3423) B (54.1235, -0.6553) 1 223.116901
4 (52.1245, -0.3423) B (49.1245, -1.3423) 2 340.738587
5 (52.1245, -0.3423) B (50.1235, -1.8553) 3 246.116984
6 (53.1235, -1.4553) C (54.1235, -0.6553) 1 122.997130
7 (53.1235, -1.4553) C (49.1245, -1.3423) 2 444.459052
8 (53.1235, -1.4553) C (50.1235, -1.8553) 3 334.514028
<强>结果:强>
{{1}}