在C3列的data.table中,我希望按每个组标记N个随机选择的行(C1)。已经在SO here,here和here上提出了几个类似的问题。但根据答案仍无法弄清楚如何为我的任务找到解决方案。
set.seed(1)
dt = data.table(C1 = c("A","A","A","B","C","C","C","D","D","D"),
C2 = c(2,1,3,1,2,3,4,5,4,5))
dt
C1 C2
1: A 2
2: A 1
3: A 3
4: B 1
5: C 2
6: C 3
7: C 4
8: D 5
9: D 4
10: D 5
以下是每个组C1对两个随机选择的行的行索引(对于B组不适用):
dt[, sample(.I, min(.N, 2)), by = C1]$V1
[1] 1 3 3 7 5 10 9
注意:对于B,只应选择一行,因为B组只包含一行。
以下是每个组中随机选择的行的解决方案,通常不适用于B组:
dt[, C3 := .I == sample(.I, 1), by = C1]
dt
C1 C2 C3
1: A 2 FALSE
2: A 1 TRUE
3: A 3 FALSE
4: B 1 FALSE
5: C 2 TRUE
6: C 3 FALSE
7: C 4 FALSE
8: D 5 TRUE
9: D 4 FALSE
10: D 5 FALSE
实际上我想在N行上展开它。我试过(两行):
dt[, C3 := .I==sample(.I, min(.N, 2)), by = C1]
当然不起作用。
非常感谢任何帮助!
答案 0 :(得分:1)
N=2
dt[, C3 := {if (.N < N) rep(TRUE,.N) else 1:.N %in% sample(.N,N) }, by=C1]
dt
# C1 C2 C3
# 1: A 2 TRUE
# 2: A 1 FALSE
# 3: A 3 TRUE
# 4: B 1 TRUE
# 5: C 2 FALSE
# 6: C 3 TRUE
# 7: C 4 TRUE
# 8: D 5 TRUE
# 9: D 4 TRUE
# 10: D 5 FALSE