我有类似的问题:
Weighted sampling with 2 vectors
我现在有一个数据集,每个观察包含1000个观察值和4个列。我想从替换原始数据集中抽取200个观测值。
但问题是:我需要为每列分配不同的概率向量。例如,对于第一列。我想要等概率c(0.001,0.001,0.001,0.001 ......)。对于第二列,我想要一些不同的东西,如c(0.0005,0.0002,......)。当然,每个概率向量总和为1。
我知道样本可以使用一个向量。但我不确定其他命令。请帮帮我!
提前谢谢! Colamonkey
答案 0 :(得分:0)
# in your case the rows are 1000 and the columns 4,
# but it is just to show the procedure
samp_prob <- data.frame(A = rep(.25, 4), B = c(.5, .1, .2, .2), C = c(.3, .6, .05, .05))
df <- data.frame(a = 1:4, b = 2:5, c = 3:6)
sam <- mapply(function(x, y) sample(x, 200, T, y), df, samp_prob)
head(sam)
a b c
[1,] 4 5 6
[2,] 1 2 4
[3,] 1 2 4
[4,] 4 4 4
[5,] 4 4 4
[6,] 1 2 4
# you can also write (it is equivalent):
mapply(df, samp_prob, FUN = sample, size = 200, replace = T)