编写程序来介绍排列

时间:2015-05-29 13:49:37

标签: r for-loop matching logistic-regression survival-analysis

基本上我想编写一个程序,将我的数据的顺序随机化n次,然后完成生存分析并将输出绘制在n

因此,让我们从matching()包中获取以下通用数据,并创建一个经过处理和未经处理的人员的数据集。 Link to package

set.seed(123)

library(Matching)
data(lalonde)

lalonde$age_cat <- with(lalonde, ifelse(age < 24, 1, 2))
attach(lalonde)

lalonde$ID <- 1:length(lalonde$age)


#The covariates we want to match on
X = cbind(age_cat, educ, black, hisp, married, nodegr, u74, u75, re75, re74)
#The covariates we want to obtain balance on
BalanceMat <- cbind(age_cat, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
                    I(re74*re75))
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)
detach(lalonde)

# now lets pair the the non-treated collisions to the treated
# BUT lets pair WITHOUT REPLACEMENT

mout <- Match(Y=NULL, Tr=lalonde$treat, X=X,
              Weight.matrix=genout, M=2,
              replace=FALSE, ties=TRUE)

summary(mout)
# we see that for 130 treated observations, we have 260 non-treated
# this is because we set M=2
# and yes length(lalonde$age[lalonde$treat==0]) == 260 but just follow me please
# but this was done for a specific reason

# now lets create a table for our 130+260 collisions
treated <- lalonde[mout$index.treated,]
# now we only want one occurence of the treated variables
library(dplyr)
treat_clean <- treated %>%
  group_by(ID) %>%
  slice(1)

non.treated <- lalonde[mout$index.control,]

# finally we can combine to form one clear data.set
matched.data <- rbind(treat_clean, non.treated)

我们现在可以进行条件逻辑回归,以确定与re78(1987年赚取的钱)和治疗相关的OR。为此,我们需要生存方案。 Link to package

library(survival)

假设如果占用者在1978年的收入超过8125,则会取得成功

matched.data$success <- with(matched.data, ifelse(re78 > 8125, 1, 0))

output <- clogit(success ~ treat, matched.data, method = 'efron')

summary(output)

所以我们看到治疗(治疗= 1)的OR是1.495

我们可以将其保存为:

iteration.1 <- exp(output$coefficients[1])

现在我们从匹配包(link)中读取replace = FALSE 注意如果为FALSE, 比赛顺序一般很重要。比赛将在 与数据排序相同的顺序

所以我要做的是创建一个n

的函数
  • 随机购买lalonde $ ID订单
  • 运行匹配流程
  • 运行clogit算法
  • 每次exp(output$coefficients[1])
  • 保存输出
  • 为每个n
  • 绘制OR(exp(output$coefficients[1])

Essenece我想在分析中引入排列。 如果我们说n = 5

,怎么办呢?

2 个答案:

答案 0 :(得分:1)

您可以使用sample来引入排列

data(lalonde)
lalonde$age_cat <- with(lalonde, ifelse(age < 24, 1, 2))
lalonde$ID <- 1:length(lalonde$age)
n <- 5
res <- rep(NA, n)
for (i in 1:n) {
    lalonde <- lalonde[sample(1:nrow(lalonde)), ] # randomise order
    ## rest of code 
    res[i] <- exp(output$coefficients[1])
}

plot(1:n, res, main="Odds Ratios")

答案 1 :(得分:1)

我是newAction的忠实粉丝:

replicate

我不知道你的意思是X <- cbind(...) # what you had before BalanceMat <- cbind(...) # ditto lalonde$ID <- seq.int(nrow(lalonde)) results <- replicate(1000, { ## not certain if it's just $ID order that matters lalonde$ID <- sample(nrow(lalonde)) ## lalonde <- lalonde[ sample(nrow(lalonde)), ] ## ... ## rest of your computation ## ... #### optionally return everything ## output #### return just the minimum exp(output$coefficients[1]) }) #### if you returned output earlier, you'll need this, otherwise not ## coef <- exp(sapply(results, function(z) z$coefficients[1])) ## plot as needed 事项的顺序还是整个数据库的顺序;相应地调整ID循环的前几行。