Question

假设我有一个tibble tbl_

tbl_ <- tibble(id = c(1,1,2,2,3,3), dta = 1:6)
tbl_
# A tibble: 6 x 2
     id   dta
  <dbl> <int>
1     1     1
2     1     2
3     2     3
4     2     4
5     3     5
6     3     6

有3个id组。我想用替换重新采样整个id组3次。例如，产生的tibble可以是：

     id   dta
  <dbl> <int>
1     1     1
2     1     2
3     1     1
4     1     2
5     3     5
6     3     6

但不

     id   dta
  <dbl> <int>
1     1     1
2     1     2
3     1     1
4     2     4
5     3     5
6     3     6

或

     id   dta
  <dbl> <int>
1     1     1
2     1     1
3     2     3
4     2     4
5     3     5
6     3     6

Answer 1

选项可以是为每个minimum获取id行号。该行号将用于生成replace = TRUE的随机样本。

library(dplyr)

tbl_ %>% mutate(rn = row_number()) %>%
  group_by(id) %>%
  summarise(minrow = min(rn)) ->min_row

indx <- rep(sample(min_row$minrow, nrow(min_row), replace = TRUE), each = 2) + 
        rep(c(0,1), 3)

tbl_[indx,]
# # A tibble: 6 x 2
#    id     dta
#   <dbl>  <int>
# 1  1.00     1
# 2  1.00     2
# 3  3.00     5
# 4  3.00     6
# 5  2.00     3
# 6  2.00     4

注意：在上面的回答中，每个id的行数被假定为2，但这个答案可以解决任意数量的ID。需要对经过硬编码的each=2和c(0,1)进行修改，以便将其扩展为每个id处理2行以上

Answer 2

以下是max_detections_per_class: 1 max_total_detections: 3和sample_n

的一个选项

distinct

在tibble中按组引导

2 个答案: