我希望模拟与现有二元变量use
相关的年龄变量(约束范围18-35)。我遇到的大多数例子都演示了如何同时模拟这两个变量。
# setup
set.seed(493)
n <- 134
dat <- data.frame(partID=seq(1, n, 1),
trt=c(rep(0, n/2),
rep(1, n/2)))
# set proportion
a <- .8
b <- .2
dat$use <- c(rbinom(n/2, 1, b),
rbinom(n/2, 1, a))
答案 0 :(得分:3)
不确定这是否是解决此问题的最佳方式,但您可以使用此处的答案:https://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable
例如(使用链接中的代码):
x1 <- dat$use # fixed given data
rho <- 0.1 # desired correlation = cos(angle)
theta <- acos(rho) # corresponding angle
x2 <- rnorm(n, 2, 0.5) # new random data
X <- cbind(x1, x2) # matrix
Xctr <- scale(X, center=TRUE, scale=FALSE) # centered columns (mean 0)
Id <- diag(n) # identity matrix
Q <- qr.Q(qr(Xctr[ , 1, drop=FALSE])) # QR-decomposition, just matrix Q
P <- tcrossprod(Q) # = Q Q' # projection onto space defined by x1
x2o <- (Id-P) %*% Xctr[ , 2] # x2ctr made orthogonal to x1ctr
Xc2 <- cbind(Xctr[ , 1], x2o) # bind to matrix
Y <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2))) # scale columns to length 1
x <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1] # final new vector
dat$age <- (1 + x) * 25
cor(dat$use, dat$age)
# 0.1
summary(dat$age)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 20.17 23.53 25.00 25.00 26.59 30.50