R - 绘制具有固定概率的随机样本

时间:2015-10-05 11:36:36

标签: r replace sample

我正在尝试在R中编写一个循环,以根据原始数据集的随机采样创建200个数据帧。我希望用固定比例的10%男性(编码为1)和90%女性(编码为0) - 变量SEX - 以及与原始数据集相同的行数来替换样本。

这是我有多远:

for (i in 1:200) {

 smpl[i] <- data[sample(nrow(data), nrow(data), replace=T, prob=ifelse(data$SEX==1,0.1,0.9)),] 

}

遗憾的是,该代码不起作用......

首先,绘制随机样本的代码不会使男性与女性的比例保持在0.1:0.9。

其次,当我尝试遍历命令时,我收到一条错误消息:

[<-.data.frame中的警告(*tmp*,我,值=列表(ID = c(32604L,11645L,:   提供了41个变量来替换1个变量

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

首先,一些示例数据:

## Sample data
nMen <- 50
nWomen <- 60

set.seed(124)

mydata <- data.frame(SEX = rep(c("female", "male"), times = c(nWomen, nMen)),
    myValue = rnorm(nMen + nWomen), ID = seq_len(nMen + nWomen))

然后,计算每个样本中你喜欢的女性和男性的数量 - 这些必须是整数

## Number of women and men for the sampling
nSampW <- (nWomen + nMen) * 0.9
nSampM <- (nWomen + nMen) * 0.1
## These should be integer (the following should be TRUE)
nSampW %% 1 ==0
nSampM %% 1 ==0

然后设置结果向量 - 以下内容创建一个包含200个样本空间的列表

## Set up results list
mySamp <- vector(mode = "list", length = 200)

然后循环,从按性别划分的指数中抽取上面计算的男女人数

## The loop
for(i in seq_along(mySamp)) {
## Get indices by SEX
    idxW <- which(mydata$SEX == "female")
    idxM <- which(mydata$SEX == "male")
## Sample corresponding number of rows from those indexes with replacement
    tempW <- mydata[sample(idxW, nSampW, replace = TRUE), ]
    tempM <- mydata[sample(idxM, nSampM, replace = TRUE), ]
## rbind back together and assign
    mySamp[[i]] <- rbind(tempW, tempM)
}

然后检查比例是否正确

# sapply(mySamp[1:10], function(x) prop.table(table(x$SEX)))
#        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# female  0.9  0.9  0.9  0.9  0.9  0.9  0.9  0.9  0.9   0.9
# male    0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1