在逻辑回归中重新采样

时间:2016-02-22 16:46:03

标签: r bootstrapping glm resampling

我有一个简单的数据集,其中一个Y和10个预测变量(X1-X10)编码为0,1或2进行100次观察。

 n <- 100
 Y <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
 X1 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.4,0.5))
 X2 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.5,0.25,0.25))
 X3 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.3,0.4,0.4))
 X4 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
 X5 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.2,0.7))
 X6 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.8,0.1,0.1))
 X7 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.1,0.8))
 X8 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
 X9 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
X10 <- c(0,2,2,2,2,2,2,2,0,2,0,2,2,0,0,0,0,0,2,0,0,2,2,0,0,2,2,2,0,2,0,2,0,2,1,2,1,1,1,1,1,1,1,1,1,1,1,0,1,2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0)

datasim <- data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10)

我正在尝试按如下方式进行bootstrap重采样,它可以为一个变量生成100个不同的样本集。

 B <- 100
 n <- length(datasim$X1)
 boot.samples <- matrix(sample(datasim$X1, size=B*n, replace=TRUE),B,n)

现在,我正在尝试使用GLM合并计算偏差差异的函数。我的愿望是为每个bootstrap样本(100个值)产生dDeviance。我尝试了以下功能,但它只给了我100个类似的dDeviance值。

 xfunction <- function(x){
 glmfit <- glm(Y~X1, family="binomial", data=datasim)
 dDeviance <- glmfit$null.deviance-glmfit$deviance
 return(dDeviance)
 }

 boot.statistics <- apply(boot.samples,1,xfunction)

2 个答案:

答案 0 :(得分:0)

正如杰弗里所说,数据应该= x。

@Transactional
public List<Assortment> getWholeAssortment() {

    String searchQuery = "SELECT a FROM Assortment a";

    List<Assortment> result = entityManager.createQuery(searchQuery, Assortment.class).getResultList();

    return result;
}

@Transactional
public int getAssortmentCount(){

    String searchQuery = "SELECT COUNT(1) FROM test_assortment";

    int result = (Integer) entityManager.createNativeQuery(searchQuery).getSingleResult();

    return result;
}

@Transactional
public List<Assortment> getAssortmentWithCriteria(){
    CriteriaQuery<Assortment> criteria = entityManager.getCriteriaBuilder().createQuery(Assortment.class);
    criteria.select(criteria.from(Assortment.class));
    List<Assortment> result = entityManager.createQuery(criteria).getResultList();

    return result;
}

答案 1 :(得分:0)

在这样的应用中使用的xfunction的参数是矩阵中的一行。在您的原始代码中,该行未被使用,并且您每次都在运行相同数据的函数。解决这类问题的一种方法是每次按照建议(glmfit <- glm(Y~X1, family="binomial", data=x))将glm中的数据参数更改为新数据,但这假设x将是一个名为Y和X1的数据帧,而你实际拥有的x是X1的值向量。最简单的解决方案是在每次调整中更改X1。

xfunction <- function(x){
  glmfit <- glm(Y~x, family="binomial")
  dDeviance <- glmfit$null.deviance-glmfit$deviance
  return(dDeviance)
}

boot.statistics <- apply(boot.samples,1,xfunction)