遍历两个数据框以应用功能

时间:2018-12-12 03:21:52

标签: python pandas

我有以下两个数据框(缩短了):

df1
day Transmitter_ID  Species Lat Lng Date
4   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  13/08/2015
5   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  14/08/2015
6   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  15/08/2015
7   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  16/08/2015
8   A69-1601-27466  Golden perch    -35.5065473 144.4488804 17/08/2015
8   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  17/08/2015
9   A69-1601-27466  Golden perch    -35.5065473 144.4488804 18/08/2015
10  A69-1601-27466  Golden perch    -35.5065473 144.4488804 19/08/2015
11  A69-1601-27466  Golden perch    -35.5065473 144.4488804 20/08/2015
12  A69-1601-27466  Golden perch    -35.5065473 144.4488804 21/08/2015
13  A69-1601-27466  Golden perch    -35.5065473 144.4488804 22/08/2015
14  A69-1601-27466  Golden perch    -35.5065473 144.4488804 23/08/2015
15  A69-1601-27466  Golden perch    -35.5065473 144.4488804 24/08/2015

rivergps_df
Lng Lat River
151.7753278 -32.90526725    HUNTER RIVER
151.77526830000002  -32.90610052    HUNTER RIVER
151.77526830000002  -32.90752299    HUNTER RIVER
151.77526830000002  -32.90758849    HUNTER RIVER
151.775397  -32.90977754    HUNTER RIVER
151.7754468 -32.91062396    HUNTER RIVER
151.775578  -32.91202941    HUNTER RIVER
151.77578799999998  -32.9142797 HUNTER RIVER
151.7758178 -32.91459931    HUNTER RIVER
151.77586340000002  -32.91508789    HUNTER RIVER
151.7764116 -32.91645856    HUNTER RIVER
151.7765776 -32.91687345    HUNTER RIVER
151.77719040000002  -32.91861786    HUNTER RIVER

我还有一个haversine函数,它需要一对lat,lng并返回两对之间的距离

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles

我想对两个数据框进行处理:

从df1中获取每个lng / lat,并针对每个点,从rivergps_df的所有lng / lat范围内应用Haversine函数

返回在haversine函数中出现最小值的rivergps_df的索引

将此rivergps_df索引附加到df1

所以我的意思是对于df1中的第一个点-35.495479100000004,144.45295380000002,我想对此应用Haversine函数,如lon1,lat1与lon2,lat2,其中lon2,lat2是rivergps_df中存在的所有点。然后,我想查找haversine函数返回的最小值,将其附加到df1并移至df1中的下一个点。

我该怎么做?

1 个答案:

答案 0 :(得分:0)

一个主意:

  • 定义一个在haversin_argmin(lat, lon, df)上迭代的函数df(例如for (lat2, lon2) df[['Lat', 'Lon']].iterrows():),并为argmin计算并返回haversine(lat, lon, lat2, lon2)

  • 然后定义另一个函数f,该函数接受一个row,获取latlon,并用haversin_argmin调用rivergps_df ,并返回row,并附加argmin作为新字段。

  • 使用pandas.DataFrame.applyapply fdf1

阅读apply的文档,以更好地理解如何定义f以及传递给apply的选项。