我正在尝试在R中编写一个循环,以根据原始数据集的随机采样创建200个数据帧。我希望用固定比例的10%男性(编码为1)和90%女性(编码为0) - 变量SEX - 以及与原始数据集相同的行数来替换样本。
这是我有多远:
for (i in 1:200) {
smpl[i] <- data[sample(nrow(data), nrow(data), replace=T, prob=ifelse(data$SEX==1,0.1,0.9)),]
}
遗憾的是,该代码不起作用......
首先,绘制随机样本的代码不会使男性与女性的比例保持在0.1:0.9。
其次,当我尝试遍历命令时,我收到一条错误消息:
[<-.data.frame
中的警告(*tmp*
,我,值=列表(ID = c(32604L,11645L,:
提供了41个变量来替换1个变量
有人可以帮忙吗?
答案 0 :(得分:1)
首先,一些示例数据:
## Sample data
nMen <- 50
nWomen <- 60
set.seed(124)
mydata <- data.frame(SEX = rep(c("female", "male"), times = c(nWomen, nMen)),
myValue = rnorm(nMen + nWomen), ID = seq_len(nMen + nWomen))
然后,计算每个样本中你喜欢的女性和男性的数量 - 这些必须是整数
## Number of women and men for the sampling
nSampW <- (nWomen + nMen) * 0.9
nSampM <- (nWomen + nMen) * 0.1
## These should be integer (the following should be TRUE)
nSampW %% 1 ==0
nSampM %% 1 ==0
然后设置结果向量 - 以下内容创建一个包含200个样本空间的列表
## Set up results list
mySamp <- vector(mode = "list", length = 200)
然后循环,从按性别划分的指数中抽取上面计算的男女人数
## The loop
for(i in seq_along(mySamp)) {
## Get indices by SEX
idxW <- which(mydata$SEX == "female")
idxM <- which(mydata$SEX == "male")
## Sample corresponding number of rows from those indexes with replacement
tempW <- mydata[sample(idxW, nSampW, replace = TRUE), ]
tempM <- mydata[sample(idxM, nSampM, replace = TRUE), ]
## rbind back together and assign
mySamp[[i]] <- rbind(tempW, tempM)
}
然后检查比例是否正确
# sapply(mySamp[1:10], function(x) prop.table(table(x$SEX)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# female 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
# male 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1