使用train_data_sample中的以下数据和以下代码,如何遍历每个索引的纬度和经度? (请参见下面的期望结果)
latitude longitude price
0 55.6632 12.6288 2595000
1 55.6637 12.6291 2850000
2 55.6637 12.6291 2850000
3 55.6632 12.6290 3198000
4 55.6632 12.6290 2995000
5 55.6638 12.6294 2395000
6 55.6637 12.6291 2995000
7 55.6642 12.6285 4495000
8 55.6632 12.6285 3998000
9 55.6638 12.6294 3975000
from numpy import cos, sin, arcsin, sqrt
from math import radians
def haversine(row):
for index in train_data_sample.index:
lon1 = train_data_sample["longitude"].loc[train_data_sample.index==index]
lat1 = train_data_sample["latitude"].loc[train_data_sample.index==index]
lon2 = row['longitude']
lat2 = row['latitude']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
def insert_dist(df):
df["distance"+str(index)] = df.apply(lambda row: haversine(row), axis=1)
return df
print(insert_dist(train_data_sample))
这是索引0的结果。它查看索引0与其他行的坐标,并返回以米为单位的距离。因此索引0和1的坐标之间的距离约为50米。
latitude longitude price distance0
0 55.6632 12.6288 2595000 0.000000
1 55.6637 12.6291 2850000 0.058658
2 55.6637 12.6291 2850000 0.058658
3 55.6632 12.6290 3198000 0.012536
4 55.6632 12.6290 2995000 0.012536
5 55.6638 12.6294 2395000 0.076550
6 55.6637 12.6291 2995000 0.058658
7 55.6642 12.6285 4495000 0.112705
8 55.6632 12.6285 3998000 0.018804
9 55.6638 12.6294 3975000 0.076550
最终结果不仅应返回distance0,还应返回distance1,distance2等。
答案 0 :(得分:1)
似乎您制作的东西比必要的复杂得多。通过将一个for循环嵌套在另一个for循环中,可以更直接地实现所需的功能。
from numpy import cos, sin, arcsin, sqrt
from math import radians
import pandas as pd
import numpy as np
# recreate your dataframe
data = [[55.6632, 12.6288, 2595000],
[55.6637, 12.6291, 2850000],
[55.6637, 12.6291, 2850000],
[55.6632, 12.6290, 3198000]]
data = np.array(data)
train_data_sample = pd.DataFrame(data, columns = ["latitude", "longitude", "price"])
# copied "distance calculating" code here
def GetDistance(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
# loop over every row with iterrows
for index, row in train_data_sample.iterrows():
distances = []
lat1, lon1 = row[["longitude", "longitude"]]
# loop again over every row with iterrows
for index_2, row_2 in train_data_sample.iterrows():
lat2, lon2 = row_2[["longitude", "longitude"]]
# get the distance
distances.append( GetDistance(lon1, lat1, lon2, lat2) )
# add the column to the dataframe
train_data_sample["distance"+str(index)] = distances
答案 1 :(得分:0)
我不会在这里使用apply,因为它可以逐行工作,但是会使用numpy代替矩阵方法。
首先将所有度数转换为弧度:
df['latitude'] *= np.pi/180
df['longitude'] *= np.pi/180
然后通过将向量重复与向量的长度一样多的次数,将纬度和经度向量转换为矩阵。对于lat2 / lon2,请进行转置。
lat1 = np.tile(df['latitude'].values.reshape([-1,1]),(1,df.shape[0]))
lon1 = np.tile(df['longitude'].values.reshape([-1,1]),(1,df.shape[0]))
lat2 = np.transpose(lat1)
lon2 = np.transpose(lon1)
现在您有4个矩阵,其中包含纬度/经度对之间的所有组合,您可以一次简单地应用所有函数一次获得所有距离:
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
此结果可以缝合到原始数据框中:
result = pd.concat([df,pd.DataFrame(km,columns=df.index)],axis=1)