使用gdist()计算分组子集之间的距离

时间:2018-02-13 19:20:25

标签: r group-by gps dplyr

可以找到2个人(squirrelID)的数据子集here

我的数据如下(仅显示相关列):

lat                  lon                NatalMidden   squirrelID    type
60.9577819984406    -138.0347849708050  -27           NA            Nest2017
60.9574120212346    -138.0345689691600  -27           NA            NatalMidden
60.9578209742904    -138.0346520338210  -27           23054         Foray
60.9575380012393    -138.0348329991100  -27           23054         Foray
60.9576250053942    -138.0339069664480  -27           23054         Foray
60.957643026486     -138.0338829942050  -27           23054         Foray
60.9575670026243    -138.0348739866170  -27           23054         Foray

例如,对于squirrelID 23054,它位于(Foray)多次(type列)并且我有相应的纬度(lat)和经度(每个lon Foray}。我正在尝试分别为每个人(Foray)计算每个typeNest2017列)和typesquirrelID列)之间的距离。

以下代码有效(并且给我一个15.11501米的值),但它需要我手动输入每个点。这不是问题,但是我正在使用+2000个观察结果,每个gridNatalMiddensquirrelID列有超过2个选项。

library(Imap)

gdist(60.9578209742904,-138.0346520338210, 60.9577819984406, -138.0347849708050, units="m", verbose=FALSE)

我是否有办法在 dplyr 框架中 group_by(squirrelID) 工作,然后计算每个之间的距离 Foray 及其相应的 Nest2017 (对于 {具有相同的 NatalMidden {1}} Foray )?

我的最终目标是为每个Nest2017的{​​{1}}和Foray之间的距离创建一个新列。

更新:

我尝试了以下内容:

Nest2017

但是这些子集在squirrelID函数中不起作用(我收到此错误):

nests<-df %>% #creating a new data frame for Nest2017 points only
    filter(type %in% "Nest2017") %>%
    select(ID,lat,lon,ele,grid,NatalMidden,type)

foray<-df %>% #creating a new data frame for Foray points only
    filter(type %in% "Foray") %>%
    mutate(sq_id=as.factor(sq_id)) %>%
    group_by(sq_id)

1 个答案:

答案 0 :(得分:0)

我对fig = plt.figure(figsize=(20, 8)) fig.add_subplot(1, 3, 1) ax = sns.violinplot(x='feature', y='height', data=train_cleansed_height, scale='count', hue='feature', split=True, palette='seismic', inner='quartile') fig.add_subplot(1, 3, 2) ax = sns.violinplot(x='workaround', y='height', data=train_cleansed_height, scale='count', hue='feature', split=True, palette='seismic', inner='quartile') fig.add_subplot(1, 3, 3) ax = sns.violinplot(x=None, y='height', data=train_cleansed_height, scale='count', hue='feature', split=True, palette='seismic', inner='quartile') plt.xlabel('x=None') 包不是很熟悉,但我认为这会做你感兴趣的事情:

dplyr

这基本上是我在第一条评论中提出的,但使用# read data from the FigShare linked file squirrel_data <- read.table("figshare.txt", header=T) # split into 'Forays' and 'Nests' nests <- squirrel_data %>% filter(type %in% "Nest2017") foray <- squirrel_data %>% filter(type %in% "Foray") # merge 'Forays' and 'Nests' by 'NatalMidden' nests_foray <- inner_join( nests, foray, by = "NatalMidden", suffix = c(".nest", ".foray")) # calculate the distance for each row, keep 'SquirrelID' and 'Dist' results <- nests_foray %>% rowwise() %>% mutate(dist = gdist(lat.nest, lon.nest, lat.foray, lon.foray, units = "m")) %>% select(squirrelID.foray, dist) head(results, n = 3) ## A tibble: 3 x 2 # squirrelID.foray dist # <int> <dbl> #1 22684 14.03843 #2 22684 59.06996 #3 22684 13.40567 函数而不是dplyr。这个想法只是通过“NatalMidded”在“Foray”行和“Nest2017”行之间创建内部联接,然后简单地计算每行的距离并用“SquirrelID”报告它。我希望这会有所帮助。