我试图用df2的每一行循环df1的每一行,并在df1中创建一个新的col并将min(所有值)存储在其中。
lat_sc= shopping_centers['lat']
long_sc= shopping_centers['lng']
for i, j in zip(lat_sc,long_sc):
for lat_real, long_real in zip(real_estate['lat'],real_estate['lng']):
euclid_dist.append( lat_real - i)
short_dist.append(min(euclid_dist))
euclid_dist = []
结果:
df1 ['shortest'] = min(df1 ['lat']-each lat of df2
)
df1 ['nearest sc'] =相应的sc_id
编辑以将sc_id包含在df1中
答案 0 :(得分:0)
随着df2变大,这可能需要大量计算,但是您可以找到df1距离与所有df2距离之差(可以更有效地做到这一点)
def find_euclid_dist(row):
dist_arr = np.sqrt((ref_lats - row["lat"])**2 + (ref_longs - row["lng"])**2)
return np.min(dist_arr)
ref_lats = df2["lat"].values
ref_longs = df2["lng"].values
df1["shortest"] = df1.apply(find_euclid_dist, axis=1)
答案 1 :(得分:0)
如何使用cdist from scipy?
from scipy.spatial.distance import cdist
df1['shortest'] = cdist(df1[['lat','lng']], df2[['lat','lng']], metric='euclidean').min(1)
print(df1)
返回:
lat lng addr_street shortest
0 -37.980523 -37.980523 37 Scarlet Drive 183.022436
1 -37.776161 -37.776161 999 Heidelberg Road 182.817951
2 -37.926238 -37.926238 47 New Street 182.968096
3 -37.800056 -37.800056 3/113 Normanby Road 182.841849