Question

可以找到2个人（squirrelID）的数据子集here。

我的数据如下（仅显示相关列）：

lat                  lon                NatalMidden   squirrelID    type
60.9577819984406    -138.0347849708050  -27           NA            Nest2017
60.9574120212346    -138.0345689691600  -27           NA            NatalMidden
60.9578209742904    -138.0346520338210  -27           23054         Foray
60.9575380012393    -138.0348329991100  -27           23054         Foray
60.9576250053942    -138.0339069664480  -27           23054         Foray
60.957643026486     -138.0338829942050  -27           23054         Foray
60.9575670026243    -138.0348739866170  -27           23054         Foray

例如，对于squirrelID 23054，它位于（Foray）多次（type列）并且我有相应的纬度（lat）和经度（每个lon Foray}。我正在尝试分别为每个人（Foray）计算每个type（Nest2017列）和type（squirrelID列）之间的距离。

以下代码有效（并且给我一个15.11501米的值），但它需要我手动输入每个点。这不是问题，但是我正在使用+2000个观察结果，每个grid，NatalMidden和squirrelID列有超过2个选项。

library(Imap)

gdist(60.9578209742904,-138.0346520338210, 60.9577819984406, -138.0347849708050, units="m", verbose=FALSE)

我是否有办法在 dplyr 框架中 group_by(squirrelID) 工作，然后计算每个之间的距离 Foray 及其相应的 Nest2017 （对于 {具有相同的 NatalMidden {1}} 和 Foray ）？

我的最终目标是为每个Nest2017的{{1}}和Foray之间的距离创建一个新列。

更新：

我尝试了以下内容：

Nest2017

但是这些子集在squirrelID函数中不起作用（我收到此错误）：

nests<-df %>% #creating a new data frame for Nest2017 points only filter(type %in% "Nest2017") %>% select(ID,lat,lon,ele,grid,NatalMidden,type) foray<-df %>% #creating a new data frame for Foray points only filter(type %in% "Foray") %>% mutate(sq_id=as.factor(sq_id)) %>% group_by(sq_id)

Answer 1

我对fig = plt.figure(figsize=(20, 8)) fig.add_subplot(1, 3, 1) ax = sns.violinplot(x='feature', y='height', data=train_cleansed_height, scale='count', hue='feature', split=True, palette='seismic', inner='quartile') fig.add_subplot(1, 3, 2) ax = sns.violinplot(x='workaround', y='height', data=train_cleansed_height, scale='count', hue='feature', split=True, palette='seismic', inner='quartile') fig.add_subplot(1, 3, 3) ax = sns.violinplot(x=None, y='height', data=train_cleansed_height, scale='count', hue='feature', split=True, palette='seismic', inner='quartile') plt.xlabel('x=None')包不是很熟悉，但我认为这会做你感兴趣的事情：

dplyr

这基本上是我在第一条评论中提出的，但使用# read data from the FigShare linked file squirrel_data <- read.table("figshare.txt", header=T) # split into 'Forays' and 'Nests' nests <- squirrel_data %>% filter(type %in% "Nest2017") foray <- squirrel_data %>% filter(type %in% "Foray") # merge 'Forays' and 'Nests' by 'NatalMidden' nests_foray <- inner_join( nests, foray, by = "NatalMidden", suffix = c(".nest", ".foray")) # calculate the distance for each row, keep 'SquirrelID' and 'Dist' results <- nests_foray %>% rowwise() %>% mutate(dist = gdist(lat.nest, lon.nest, lat.foray, lon.foray, units = "m")) %>% select(squirrelID.foray, dist) head(results, n = 3) ## A tibble: 3 x 2 # squirrelID.foray dist # <int> <dbl> #1 22684 14.03843 #2 22684 59.06996 #3 22684 13.40567函数而不是dplyr。这个想法只是通过“NatalMidded”在“Foray”行和“Nest2017”行之间创建内部联接，然后简单地计算每行的距离并用“SquirrelID”报告它。我希望这会有所帮助。

使用gdist（）计算分组子集之间的距离

1 个答案: