在以下数据(包含在dput
中)中,我对三个人(IndIDII)进行了重复观察(经纬度)。请注意,每个人的位置数不同,并且它们由IndYear
排列。
IndIDII IndYear WintLat WintLong
1 BHS_265 BHS_265-2015 47.61025 -112.7210
2 BHS_265 BHS_265-2016 47.59884 -112.7089
3 BHS_770 BHS_770-2016 42.97379 -109.0400
4 BHS_770 BHS_770-2017 42.97129 -109.0367
5 BHS_770 BHS_770-2018 42.97244 -109.0509
6 BHS_377 BHS_377-2015 43.34744 -109.4821
7 BHS_377 BHS_377-2016 43.35559 -109.4445
8 BHS_377 BHS_377-2017 43.35195 -109.4566
9 BHS_377 BHS_377-2018 43.34765 -109.4892
我想filter
创建一个新的df
,其中每个IndIDII
具有两个连续的行。在我的较大数据集中,所有个体至少具有2个观测值(即行),每个个体具有2到4个观测值。显然,对于只有两行的个人,代码将返回仅有的两行。如果有更多数据,将随机选择第1和2行,或 2和3,或 3和4。行的顺序并不重要,只要它们是连续的即可(即可以返回3和4 或 4和3)。
一如既往,非常感谢!
Dat <- structure(list(IndIDII = c("BHS_265", "BHS_265", "BHS_770", "BHS_770",
"BHS_770", "BHS_377", "BHS_377", "BHS_377", "BHS_377"), IndYear = c("BHS_265-2015",
"BHS_265-2016", "BHS_770-2016", "BHS_770-2017", "BHS_770-2018",
"BHS_377-2015", "BHS_377-2016", "BHS_377-2017", "BHS_377-2018"
), WintLat = c(47.6102519805014, 47.5988417247191, 42.9737859090909,
42.9712914772727, 42.9724390816327, 43.3474354347826, 43.3555934579439,
43.3519543396226, 43.3476466990291), WintLong = c(-112.720994832869,
-112.708887595506, -109.039964727273, -109.036693522727, -109.050923061224,
-109.482114456522, -109.444522149533, -109.45659254717, -109.489241553398
)), class = "data.frame", row.names = c(NA, -9L))
答案 0 :(得分:2)
这是使用R基本函数的解决方案
> set.seed(505) # you can set whatever seed you want, I set 505 for reproducibility
> lapply(split(Dat, Dat$IndIDII), function(x) {
ind <- sample(nrow(x))
cons <- if(ind[1] < max(ind)){
c(ind[1], ind[1]+1)
} else {
c(ind[1], ind[1]-1)
}
x[cons, ]
})
$`BHS_265`
IndIDII IndYear WintLat WintLong
1 BHS_265 BHS_265-2015 47.61025 -112.7210
2 BHS_265 BHS_265-2016 47.59884 -112.7089
$BHS_377
IndIDII IndYear WintLat WintLong
6 BHS_377 BHS_377-2015 43.34744 -109.4821
7 BHS_377 BHS_377-2016 43.35559 -109.4445
$BHS_770
IndIDII IndYear WintLat WintLong
3 BHS_770 BHS_770-2016 42.97379 -109.0400
4 BHS_770 BHS_770-2017 42.97129 -109.0367
答案 1 :(得分:2)
您可以使用ave
。在每个组中,创建一个行索引(i <- seq_along(x)
)。要获取要保留的第一个行索引,请从除最后一行索引(sample(head(i, -1), 1)
之外的所有行中抽取一行样本。还包括下一行(+ 0:1
)。检查采样行中有哪些行索引( i %in% ...
)。将结果强制返回逻辑到子数据。
Dat[as.logical(ave(Dat$IndIDII, Dat$IndIDII, FUN = function(x){
i <- seq_along(x)
i %in% (sample(head(i, -1), 1) + 0:1)
})), ]
# IndIDII IndYear WintLat WintLong
# 1 BHS_265 BHS_265-2015 47.61025 -112.7210
# 2 BHS_265 BHS_265-2016 47.59884 -112.7089
# 4 BHS_770 BHS_770-2017 42.97129 -109.0367
# 5 BHS_770 BHS_770-2018 42.97244 -109.0509
# 7 BHS_377 BHS_377-2016 43.35559 -109.4445
# 8 BHS_377 BHS_377-2017 43.35195 -109.4566
同样,但更简洁,data.table
及其内置行索引(.I
)和每组的行数(.N
)
library(data.table)
setDT(Dat)
Dat[Dat[ , (sample(.I[-.N], 1)) + 0:1, by = IndIDII]$V1]
答案 2 :(得分:1)
这是一种有点笨拙的tidyeval方式。肯定可以改进(如果要连续多个,该怎么办?),但可以在此应用程序中使用。您还可以在函数末尾使用select()
删除行列。
Dat <- structure(list(IndIDII = c("BHS_265", "BHS_265", "BHS_770", "BHS_770", "BHS_770", "BHS_377", "BHS_377", "BHS_377", "BHS_377"), IndYear = c("BHS_265-2015", "BHS_265-2016", "BHS_770-2016", "BHS_770-2017", "BHS_770-2018", "BHS_377-2015", "BHS_377-2016", "BHS_377-2017", "BHS_377-2018"), WintLat = c(47.6102519805014, 47.5988417247191, 42.9737859090909, 42.9712914772727, 42.9724390816327, 43.3474354347826, 43.3555934579439, 43.3519543396226, 43.3476466990291), WintLong = c(-112.720994832869, -112.708887595506, -109.039964727273, -109.036693522727, -109.050923061224, -109.482114456522, -109.444522149533, -109.45659254717, -109.489241553398)), class = "data.frame", row.names = c(NA, -9L))
library(tidyverse)
set.seed(123)
sample_2_consecutive <- function(tbl, group_col){
group_col <- enquo(group_col)
with_rownums <- tbl %>%
group_by(!!group_col) %>%
mutate(row = row_number())
rows_to_keep <- with_rownums %>%
filter(row != max(row)) %>%
sample_n(1) %>%
mutate(row2 = row + 1) %>%
gather(key, row, row, row2)
with_rownums %>%
semi_join(rows_to_keep, by = c(quo_name(quo(!!group_col)), "row")) %>%
arrange(!!group_col, row) %>%
ungroup() # %>%
# select(-row)
}
sample_2_consecutive(Dat, IndIDII)
#> # A tibble: 6 x 5
#> IndIDII IndYear WintLat WintLong row
#> <chr> <chr> <dbl> <dbl> <int>
#> 1 BHS_265 BHS_265-2015 47.6 -113. 1
#> 2 BHS_265 BHS_265-2016 47.6 -113. 2
#> 3 BHS_377 BHS_377-2017 43.4 -109. 3
#> 4 BHS_377 BHS_377-2018 43.3 -109. 4
#> 5 BHS_770 BHS_770-2016 43.0 -109. 1
#> 6 BHS_770 BHS_770-2017 43.0 -109. 2
由reprex package(v0.2.0)于2018-09-27创建。