可以找到2个人(squirrelID
)的数据子集here。
我的数据如下(仅显示相关列):
lat lon NatalMidden squirrelID type
60.9577819984406 -138.0347849708050 -27 NA Nest2017
60.9574120212346 -138.0345689691600 -27 NA NatalMidden
60.9578209742904 -138.0346520338210 -27 23054 Foray
60.9575380012393 -138.0348329991100 -27 23054 Foray
60.9576250053942 -138.0339069664480 -27 23054 Foray
60.957643026486 -138.0338829942050 -27 23054 Foray
60.9575670026243 -138.0348739866170 -27 23054 Foray
例如,对于squirrelID
23054,它位于(Foray
)多次(type
列)并且我有相应的纬度(lat
)和经度(每个lon
Foray
}。我正在尝试分别为每个人(Foray
)计算每个type
(Nest2017
列)和type
(squirrelID
列)之间的距离。
以下代码有效(并且给我一个15.11501米的值),但它需要我手动输入每个点。这不是问题,但是我正在使用+2000个观察结果,每个grid
,NatalMidden
和squirrelID
列有超过2个选项。
library(Imap)
gdist(60.9578209742904,-138.0346520338210, 60.9577819984406, -138.0347849708050, units="m", verbose=FALSE)
我是否有办法在 dplyr
框架中 group_by(squirrelID)
工作,然后计算每个>之间的距离 Foray
及其相应的 Nest2017
(对于 {具有相同的 NatalMidden
{1}} 和 Foray
)?
我的最终目标是为每个Nest2017
的{{1}}和Foray
之间的距离创建一个新列。
更新:
我尝试了以下内容:
Nest2017
但是这些子集在squirrelID
函数中不起作用(我收到此错误):
nests<-df %>% #creating a new data frame for Nest2017 points only
filter(type %in% "Nest2017") %>%
select(ID,lat,lon,ele,grid,NatalMidden,type)
foray<-df %>% #creating a new data frame for Foray points only
filter(type %in% "Foray") %>%
mutate(sq_id=as.factor(sq_id)) %>%
group_by(sq_id)
答案 0 :(得分:0)
我对fig = plt.figure(figsize=(20, 8))
fig.add_subplot(1, 3, 1)
ax = sns.violinplot(x='feature', y='height',
data=train_cleansed_height,
scale='count',
hue='feature', split=True,
palette='seismic',
inner='quartile')
fig.add_subplot(1, 3, 2)
ax = sns.violinplot(x='workaround', y='height',
data=train_cleansed_height,
scale='count',
hue='feature', split=True,
palette='seismic',
inner='quartile')
fig.add_subplot(1, 3, 3)
ax = sns.violinplot(x=None, y='height',
data=train_cleansed_height,
scale='count',
hue='feature', split=True,
palette='seismic',
inner='quartile')
plt.xlabel('x=None')
包不是很熟悉,但我认为这会做你感兴趣的事情:
dplyr
这基本上是我在第一条评论中提出的,但使用# read data from the FigShare linked file
squirrel_data <- read.table("figshare.txt", header=T)
# split into 'Forays' and 'Nests'
nests <- squirrel_data %>%
filter(type %in% "Nest2017")
foray <- squirrel_data %>%
filter(type %in% "Foray")
# merge 'Forays' and 'Nests' by 'NatalMidden'
nests_foray <- inner_join(
nests, foray, by = "NatalMidden", suffix = c(".nest", ".foray"))
# calculate the distance for each row, keep 'SquirrelID' and 'Dist'
results <- nests_foray %>%
rowwise() %>%
mutate(dist = gdist(lat.nest, lon.nest,
lat.foray, lon.foray, units = "m")) %>%
select(squirrelID.foray, dist)
head(results, n = 3)
## A tibble: 3 x 2
# squirrelID.foray dist
# <int> <dbl>
#1 22684 14.03843
#2 22684 59.06996
#3 22684 13.40567
函数而不是dplyr
。这个想法只是通过“NatalMidded”在“Foray”行和“Nest2017”行之间创建内部联接,然后简单地计算每行的距离并用“SquirrelID”报告它。我希望这会有所帮助。