Pandas纬度 - 连续行之间距离的经度

时间:2016-11-06 18:10:11

标签: python pandas numpy geospatial haversine

我在Python 2.7中的Pandas DataFrame中有以下内容:

Ser_Numb        LAT      LONG
       1  74.166061 30.512811
       2  72.249672 33.427724
       3  67.499828 37.937264
       4  84.253715 69.328767
       5  72.104828 33.823462
       6  63.989462 51.918173
       7  80.209112 33.530778
       8  68.954132 35.981256
       9  83.378214 40.619652
       10 68.778571 6.607066

我希望计算数据帧中连续行之间的距离。输出应该如下所示:

Ser_Numb          LAT        LONG   Distance
       1    74.166061   30.512811          0
       2    72.249672   33.427724          d_between_Ser_Numb2 and Ser_Numb1
       3    67.499828   37.937264          d_between_Ser_Numb3 and Ser_Numb2
       4    84.253715   69.328767          d_between_Ser_Numb4 and Ser_Numb3
       5    72.104828   33.823462          d_between_Ser_Numb5 and Ser_Numb4
       6    63.989462   51.918173          d_between_Ser_Numb6 and Ser_Numb5
       7    80.209112   33.530778   .
       8    68.954132   35.981256   .
       9    83.378214   40.619652   .
       10   68.778571   6.607066    .

尝试

This post看起来有些相似,但它正在计算固定点之间的距离。我需要连续点之间的距离。

我尝试按如下方式对其进行调整:

df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG'])
df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1))
df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1))
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

但是,我收到以下错误:

Traceback (most recent call last):
  File "C:\Python27\test.py", line 115, in <module>
    df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper
    "{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
[Finished in 2.3s with exit code 1]

此错误是根据MaxU的评论修复的。通过修复,此计算的输出没有意义 - 距离接近8000 km:

   Ser_Numb        LAT       LONG   LAT_rad   LON_rad      dLON      dLAT     distance
0         1  74.166061  30.512811  1.294442  0.532549       NaN       NaN          NaN
1         2  72.249672  33.427724  1.260995  0.583424  0.574129  1.238402  8010.487211
2         3  67.499828  37.937264  1.178094  0.662130  0.651947  1.156086  7415.364469
3         4  84.253715  69.328767  1.470505  1.210015  1.198459  1.449943  9357.184623
4         5  72.104828  33.823462  1.258467  0.590331  0.569212  1.232802  7992.087820
5         6  63.989462  51.918173  1.116827  0.906143  0.895840  1.094862  7169.812123
6         7  80.209112  33.530778  1.399913  0.585222  0.569407  1.380421  8851.558260
7         8  68.954132  35.981256  1.203477  0.627991  0.617777  1.179044  7559.609520
8         9  83.378214  40.619652  1.455224  0.708947  0.697986  1.434220  9194.371978
9        10  68.778571   6.607066  1.200413  0.115315  0.102942  1.175014          NaN

根据:

  • online calculator:如果我使用Latitude1 = 74.166061, 经度1 = 30.512811,纬度2 = 72.249672,经度2 = 33.427724 然后我得到233公里
  • 发现了半胱氨酸功能 here as:print haversine(30.512811, 74.166061, 33.427724, 72.249672)然后我 得到232.55公里

答案应该是233公里,但我的方法是给出~8000公里。我认为我试图在连续的行之间进行迭代是有问题的。

问题: 在熊猫中有办法做到这一点吗?或者我是否需要一次循环数据帧?

其他信息:

要创建上述DF,请选择它并复制到剪贴板。然后:

import pandas as pd
df = pd.read_clipboard()
print df

1 个答案:

答案 0 :(得分:27)

你可以使用this great solution (c) @derricw(别忘了赞成它; - ):

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))


df['dist'] = \
    haversine_np(df.LONG.shift(), df.LAT.shift(),
                 df.loc[1:, 'LONG'], df.loc[1:, 'LAT'])

结果:

In [566]: df
Out[566]:
   Ser_Numb        LAT       LONG         dist
0         1  74.166061  30.512811          NaN
1         2  72.249672  33.427724   232.549785
2         3  67.499828  37.937264   554.905446
3         4  84.253715  69.328767  1981.896491
4         5  72.104828  33.823462  1513.397997
5         6  63.989462  51.918173  1164.481327
6         7  80.209112  33.530778  1887.256899
7         8  68.954132  35.981256  1252.531365
8         9  83.378214  40.619652  1606.340727
9        10  68.778571   6.607066  1793.921854

更新:这有助于理解逻辑:

In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
           0          1
0        NaN        NaN
1  74.166061  72.249672
2  72.249672  67.499828
3  67.499828  84.253715
4  84.253715  72.104828
5  72.104828  63.989462
6  63.989462  80.209112
7  80.209112  68.954132
8  68.954132  83.378214
9  83.378214  68.778571