计算从大熊猫数据框中给出的经纬度的距离

时间:2021-07-17 07:17:23

标签: python pandas

我有一个如下的数据框:

      payeeId    latHome   longHome     Total_Amnt
 0  193fde722     0.000000   0.000000        15.0
 1  4d8ecb2b5c   28.425515  77.097547        10.0
 2  2c3ea738     28.542923  77.253164        20.0
 3  2961f3e8     28.542898  77.253162        10.0
 4  5cda3d3763   28.461630  77.031944    129000.0
 5  3cb02ccbfc   26.180680  91.740042       220.0
 6  79918aae03    0.000000   0.000000      1760.0

我正在尝试计算两个连续的 latHomelongHome 之间的距离。为此,我遵循了 this SO post 。下面是我正在使用然后应用的函数:

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
   if to_radians:
      lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
   a = np.sin((lat2-lat1)/2.0)**2 + \
       np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
   return earth_radius * 2 * np.arcsin(np.sqrt(a))

df_c['Dist_p'] = df_c.apply(haversine(lat1=df_c['latHome'].astype(float).shift(), \
                            lon1 = df_c['longHome'].astype(float).shift(), \
                            lat2 = df_c['latHome'].astype(float), \
                            lon2=df_c['longHome'].astype(float)))

但我收到以下错误:

ValueError: no results

当我直接使用此函数时,即没有 apply 时,我也会收到以下错误。

File "<ipython-input-1-3a6c757b499d>", line 13, in haversine
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

 TypeError: loop of ufunc does not support argument 0 of type Series which has no callable 
 radians method

任何线索将不胜感激。

1 个答案:

答案 0 :(得分:0)

  • 有多个用于计算距离的库和函数。我用过 geopy
  • 键是对齐数据/行以传递给选定的距离函数。这个用例是 shift(-1),但是将 NaN 传递给这个距离函数是无效的,因此没有下一行的最后一行默认为 (0,0)
import geopy.distance

df = pd.read_csv(io.StringIO("""      payeeId    latHome   longHome     Total_Amnt
 0  193fde722     0.000000   0.000000        15.0
 1  4d8ecb2b5c   28.425515  77.097547        10.0
 2  2c3ea738     28.542923  77.253164        20.0
 3  2961f3e8     28.542898  77.253162        10.0
 4  5cda3d3763   28.461630  77.031944    129000.0
 5  3cb02ccbfc   26.180680  91.740042       220.0
 6  79918aae03    0.000000   0.000000      1760.0"""),sep="\s+",)


# prep tuples to pass to geopy.distance
df['Dist_p'] = df.loc[:, ["latHome", "longHome"]].join(
    df.loc[:, ["latHome", "longHome"]].shift(-1).fillna(0), rsuffix="_2"
).apply(
    lambda r: geopy.distance.geodesic(
        (r["latHome"], r["longHome"]), (r["latHome_2"], r["longHome_2"])
    ).km,
    axis=1,
)

<头>
payeeId latHome longHome Total_Amnt Dist_p
0 193fde722 0 0 15 8753.19
1 4d8ecb2b5c 28.4255 77.0975 10 20.0375
2 2c3ea738 28.5429 77.2532 20 0.00277761
3 2961f3e8 28.5429 77.2532 10 23.4558
4 5cda3d3763 28.4616 77.0319 129000 1476.46
5 3cb02ccbfc 26.1807 91.74 220 10189.4
6 79918aae03 0 0 1760 0