我是熊猫的初学者。 我有一个以场所ID及其纬度和经度作为列的DataFrame,我需要制作一个单独的Dataframe来查找每个场所之间的距离。有38333个场地,运行38333 * 38333循环似乎不切实际。谁能给我一个更好的解决方案?
答案 0 :(得分:0)
如果您想举一个例子,可以做些什么:
def haversine_np(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat / 2.0) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0) ** 2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
# =========== just to create random lat and long
from random import uniform
def newpoint():#long,lat
return uniform(-180, 180), uniform(-90, 90)
n=5 #choose the number of random points
points = (newpoint() for x in range(n))
lon = [x for x,y in points]
points = (newpoint() for x in range(n))
lat = [y for x,y in points]
id = [x for x in range(n)]
df = pd.DataFrame({'id': id, 'Latitude': lat, 'Longitude': lon})
print(df)
df示例的输出:
id Latitude Longitude
0 0 30.052750 -35.294843
1 1 60.588742 -124.559868
2 2 -23.872878 -21.469725
3 3 -67.234086 -95.865194
4 4 -26.889749 -179.668853
def distance_ids(orig, dest):
return dist[np.abs(orig - dest)][np.amin([orig, dest])]
lat = df['Latitude'].values;lon = df['Longitude'].values
# if enough mem, you could calculate the distances between all points
dist=[]
for index in range(len(lat)):
d = haversine_np(np.roll(lon, -index), np.roll(lat, -index), lon, lat)
# you could include the result in dataframe
df[f'0 to {index}'] = pd.Series(dist)
# or you could append the result in big array
dist.append(d)
# in this case, you could trap the distance between 2 ids
# with the function: distance_ids(3, 4) for example
# you could just calculate the distances between one id and all others ids
#for id = 2 for example,
index = 2
lat1 = np.repeat(lat[2], len(lat))
lon1 = np.repeat(lon[2], len(lat))
#dist_index contains an array of all distances from id 2 to all others ids
dist_index = haversine_np(lat1, lon1, lon, lat)