来自R中不同列的样本

时间:2018-07-05 10:51:19

标签: r dataframe sample

我有一个概率矢量,比如

prob=c(0.1,0.8,0.1)

和一个数据框:df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))

我想用替换从df采样n对象,第一列的概率为0.1,第二列的概率为0.8,第三列的概率为0.1

2 个答案:

答案 0 :(得分:1)

我们将取消列出data.frame,并即时修改prob向量,使其具有适当的长度。

df <- data.frame(c("A","B","A"), c(1,2,3), c("q","v","z"), stringsAsFactors = F)

n <- 5
set.seed(1)
unname(sample(unlist(df), n, replace = TRUE, prob= rep(prob, each = nrow(df))))
# [1] "3" "1" "A" "z" "2"

如果您真的是从矩阵开始的,而不是一个短一点的data.frame

df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
set.seed(1)
sample(df, n, replace = TRUE, prob= rep(prob, each = nrow(df)))
# [1] "3" "1" "A" "z" "2"

从列表中(回答评论)

l =list(c("A","B"),c(1,2,3),c("q","v","z","w"))
set.seed(1)
sample(unlist(l), n, replace = TRUE, prob= rep(prob/lengths(l), lengths(l)))
# [1] "3" "2" "1" "v" "3" "B" "q"

答案 1 :(得分:1)

这是基于以下假设:一列内的样本概率 是统一的:

我们首先使用向量n中的概率对prob列位置进行采样;

df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
prob=c(0.1,0.8,0.1)
n = 10

set.seed(1)
colselect <- sample(1:ncol(df), size = n, replace = TRUE, prob = prob)

[1] 2 2 2 1 2 3 1 2 2 2

然后,我们遍历列位置,并从相应列中分别采样一个元素:

sapply(colselect, function(x) sample(df[,x], 1))

[1] "1" "1" "3" "B" "3" "v" "A" "3" "2" "3"