Question

我正在尝试根据上一行的坐标来计算地理距离。有没有一种方法可以在不向数据帧中添加额外列的情况下进行计算？

示例代码：

import pandas
import geopy.distance

d = {'id_col':['A','B','C','D'], 
  'lat':[ 40.8397,40.7664,40.6845,40.6078], 
  'lon':[-104.9661,-104.999,-105.01,-105.003]
   }
df = pandas.DataFrame(data=d)

使用lambda和apply的第一种方法

df['geo_dist']=df.apply(lambda x: geopy.distance.geodesic((x['lat'],x['lon']),(x['lat'].shift(),x['lon']).shift()),axis=1)

我会收到错误：AttributeError: ("'float' object has no attribute 'shift'", u'occurred at index 0')

第二种方法是在数据框上调用函数：

def geodist(x):
    return geopy.distance.geodesic((x['lat'],x['lon']),(x['lat'].shift(),x['lon']).shift())

df['geo_dist']=geodist(f)

在这种情况下，我会收到错误消息：ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

非常感谢您的帮助。

Answer 1

第一种方法不会那样工作，因为lambda函数应用于DataFrame的单行，并且x并非您期望的所有观察值的列表。为此，您可以使用x.name-1获取先前的元素索引，然后像这样访问df中的位置

df['geo_dist']=df.apply(lambda x: geopy.distance.geodesic((x['lat'],x['lon']),(df.iloc[x.name - 1].lat,df.iloc[x.name - 1].lon)) if x.name > 0 else 0,axis=1)

希望这会有所帮助

熊猫使用上一行的值在几何距离内

1 个答案: