概率模型 - 计算平均ED50的标准误差

时间:2015-03-23 07:49:08

标签: r statistics regression probability

我无法通过glm()函数计算概率回归中ED(LD)参数的标准误差估计值。我得到的结果并不包含当前ED水平的标准误差(使用学生t检验估算2种不同的选择是必要的)。

我使用以下功能(从期刊文章中获得):

LD <- function(r, n, d, conf.level) {
    ## Set up a number series 
p <- seq(1, 99, 1)

## r=number responding, n=number treated, d=dose (untransformed), confidence interval level, 
    mod <- glm(cbind(r, (n-r)) ~ log10(d), family = binomial(link=probit))
    ### Calculate heterogeneity correction to confidence intervals according to Finney, 1971, (p.
### 72, eq. 4.27; also called "h")
    het = deviance(mod)/df.residual(mod)
    if(het < 1){het = 1} ### Heterogeneity cannot be less than 1

    ## Extract slope and intercept
    summary <- summary(mod, dispersion=het, cor = F)
    intercept <- summary$coefficients[1]
    interceptSE <- summary$coefficients[3]
    slope <- summary$coefficients[2]
    slopeSE <- summary$coefficients[4]
    z.value <- summary$coefficients[6]
    N <- sum(n)

    ## Intercept (alpha)
    b0<-intercept
    ## Slope (beta)
    b1<-slope
## Slope variance 
    vcov = summary(mod)$cov.unscaled
    var.b0<-vcov[1,1]
    ## Intercept variance
    var.b1<-vcov[2,2]
    ## Slope intercept covariance
    cov.b0.b1<-vcov[1,2]

    ## Adjust alpha depending on heterogeneity (Finney, 1971, p. 76)
    alpha<-1-conf.level
    if(het > 1) {talpha <- -qt(alpha/2, df=df.residual(mod))} else {talpha <- -qnorm(alpha/2)}

    ## Calculate g (Finney, 1971, p 78, eq. 4.36)  
## "With almost all good sets of data, g will be substantially smaller than 1.0 and 
## seldom greater than 0.4."
    g <- het * ((talpha^2 * var.b1)/b1^2)

## Calculate theta.hat for all LD levels based on probits in eta (Robertson et al., 2007, pg. 
## 27; or "m" in Finney, 1971, p. 78)
    eta = family(mod)$linkfun(p/100)  #probit distribution curve
    theta.hat <- (eta - b0)/b1

    ## Calculate correction of fiducial limits according to Fieller method (Finney, 1971, 
## p. 78-79. eq. 4.35) 
const1 <- (g/(1-g))*(theta.hat + cov.b0.b1/var.b1) # const1 <- (g/(1-g))*(theta.hat -   cov.b0.b1/var.b1)
const2a <- var.b0 + 2*cov.b0.b1*theta.hat + var.b1*theta.hat^2 - g*(var.b0 - (cov.b0.b1^2/var.b1))
    const2 <- talpha/((1-g)*b1) * sqrt(het * (const2a))

    ## Calculate the confidence intervals LCL=lower, UCL=upper (Finney, 1971, p. 78-79. eq. 4.35) 
    LCL <- (theta.hat + const1 - const2)
    UCL <- (theta.hat + const1 + const2)

    ## Calculate variance for theta.hat (Robertson et al., 2007, pg. 27)
    var.theta.hat <- (1/(theta.hat^2)) * ( var.b0 + 2*cov.b0.b1*theta.hat + var.b1*theta.hat^2 )

    ## Make a data frame from the data at all the different values
    ECtable <- data.frame(
      "p"=p,
      "N"=N,
      "EC"=10^theta.hat,
      "LCL"=10^LCL,
      "UCL"=10^UCL, 
      "slope"=slope, 
      "slopeSE"=slopeSE, 
      "intercept"=intercept, 
      "interceptSE"=interceptSE, 
      "z.value"=z.value, 
      "chisquare"=deviance(mod), 
      "df"=df.residual(mod), 
      "h"=het, 
      "g"=g,
      "theta.hat"=theta.hat,
      "var.theta.hat"=var.theta.hat)

    ## Select output level
    return(ECtable)
}

使用示例:

result <- LD(data$effected, data$total, data$dose, 0.95)

在结果中,我得到了估计的ED水平:

p   N   EC    LCL   UCL slope   slopeSE intercept   interceptSE z.value chisquare   df  h   g   theta.hat   var.theta.hat
49  49  24    39.26365  32.92103    41.18617    30.92579    12.21311    -49.32049   19.69812    2.532179    0.8584527   3   1   0.5991111   1.593991    0.06240212  0.3076671
50  50  24    39.33701  33.16216    41.26921    30.92579    12.21311    -49.32049   19.69812    2.532179    0.8584527   3   1   0.5991111   1.594801    0.06060359  0.3004097
51  51  24    39.41050  33.40318    41.35474    30.92579    12.21311    -49.32049   19.69812    2.532179    0.8584527   3   1   0.5991111   1.595612    0.05888568  0.2933069

但目前的ED级别没有估计的标准误差,也没有此标准误差的自由度(df)。 有人知道如何计算当前ED水平的标准误差和df吗?你能帮我修改这个功能更有用吗?

在许多不同的数学统计和建模计算中,ED水平的std.error都是不同的:

  • Finney&#34; Probit分析&#34; (第三版)建议使用以下公式来表示标准误差:Sm = 1/b*sqrt(Snw); SE(LD50) = 10^m * log(e)10 * Sm
  • Finney还有其他更通用的公式:V(m) = 1/b^2 * { 1/Snw + (m-x')^2/Snw(x-x')^2 }
  • Stata和STATISTICA的文档指出ED50的标准误差约为SEed50 = (ED84 - ED16) / sqrt(2N),其中N是分析组中动物的总数。

1 个答案:

答案 0 :(得分:0)

这并没有真正回答SE的问题。 相反,它是我问题的宝藏,我已经问过了。

如何从glm probit模型得到Probit的浓度估计,类似于SPSS概率分析;和信心限制。 [使用R中的Probit分析计算LLOD

### Columns of data: response(r) = detected; number_obs(n) = total; dose(d) = conc
mdf = data.frame(conc=    c(50, 25,12.5,6.25,3.125, 1),
                 total=   c(10, 10, 10 ,  10,   10, 4),
                 detected=c(10, 10, 10 ,   8,    7, 0) )
#  model    (    r          n - r     )    ~  d
m3 =glm(cbind(detected, (total-detected))  ~ conc, family=binomial(link="probit"), data=mdf)

尝试使用原始数据作为二进制1/0或T / F(上面的&#34;也可以&#34;)

expand_probit_data <- function(mdf) {
  md_binary = mdf[0, c("conc", "detected")]
  for (x in 1:nrow(mdf)){
    n = mdf[x,"total"] ; r = mdf[x, "detected"] ; d= mdf[x, "conc"]
    md_binary <- rbind(md_binary, data.frame(conc=rep(d, n), detected = c(rep(1,r),rep(0,n-r))))
  }   # binary data can be T/F or TRUE/FALSE or 1 / 0 # but NOT character or levels e.g. "DET" / "NDET" 
  return (md_binary)
} # this will expand the short table to long binary form (somewhat reverse of : xtabs(~conc+detected, md_binary))    
md_binary <- expand_probit_data(mdf)
#  model     r  ~   d
m4 =glm(detected ~ conc, family=binomial(link="probit"), data=md_binary)

现在预测概率值,p水平的浓度和置信限。这是从您的代码中真正提取的。感谢您的深入研究和提供参考。

get_estimate <- function (mod, p_level= c(0.1, 0.5, 0.9, 0.95, 0.99)) {
  ### Calculate heterogeneity correction to confidence intervals according to Finney, 1971, (p.72, eq. 4.27; also called "h")
  het = deviance(mod)/df.residual(mod) ; if(het < 1){het = 1} 
  # Heterogeneity cannot be less than 1 
  # R sets dispersion paramerter 1 by default #so I use 1, change if needed
  ## Extract slope and intercept 
  summary <- summary(mod, dispersion= 1, cor = F) # summary(mod, dispersion= het, cor = F)  # summary might change if het is lot > 1
  intercept <- summary$coefficients[1] ; interceptSE <- summary$coefficients[3]
  slope     <- summary$coefficients[2] ;     slopeSE <- summary$coefficients[4]
  z.value <- summary$coefficients[6]
  N <- sum(mdf$total) # or for m3 #  N <- nrow(md_binary) #this needs to be fixed: getting data from outside the supplied variables

  ## Intercept (alpha)  ## Slope (beta)
  b0 <- intercept   ;   b1 <- slope
  ## Slope variance  # Intercept variance  # Slope intercept covariance
  vcov = summary(mod)$cov.unscaled
  var.b0<-vcov[1,1] ;  var.b1<-vcov[2,2] ; cov.b0.b1<-vcov[1,2]

  ## Adjust alpha depending on heterogeneity (Finney, 1971, p. 76)
  alpha= 0.05  # fixed, otherwise # 1-conf.level  # e.g. if conf.level = 0.95
  if(het > 1) {talpha = -qt(alpha/2, df=df.residual(mod))} else {talpha = -qnorm(alpha/2)}

  ## Calculate g (Finney, 1971, p 78, eq. 4.36) ## "With almost all good sets of data, g will be substantially smaller than 1.0 and  seldom greater than 0.4."
  g <- het * ((talpha^2 * var.b1)/b1^2)

  ## Estimate   for all LD levels based on probits in eta ~~~~~~~~~~~| 
  ## (Robertson et al., 2007, pg.27; or "m" in Finney, 1971, p. 78)  |
  eta = family(mod)$linkfun(p_level)   #  probit distribution curve  |   p_levels = c(0.5, 0.9, 0.95, 0.99)
  eta_conc = (eta - b0)/b1  #  returns the conc or dose at p level   |   b0 = intercept ; b1 = slope
  ##----- this section was critical to my calculation _______________|  No way could I have got this estimate without this(@Mihail Pyatinskyi) post.
  # term theta.hat replaced by eta_conc here, for my convenience

  ## Calculate correction of fiducial limits according to Fieller method (Finney, 1971, p. 78-79. eq. 4.35) 
  const1 <- (g/(1-g))*(eta_conc + cov.b0.b1/var.b1) # const1 <- (g/(1-g))*(eta_conc -   cov.b0.b1/var.b1)
  const2a <- var.b0 + 2*cov.b0.b1*eta_conc + var.b1*eta_conc^2 - g*(var.b0 - (cov.b0.b1^2/var.b1))
  const2 <- talpha/((1-g)*b1) * sqrt(het * (const2a))

  ## Calculate the confidence intervals LCL=lower, UCL=upper (Finney, 1971, p. 78-79. eq. 4.35) 
  LCL <- (eta_conc + const1 - const2)
  UCL <- (eta_conc + const1 + const2)
  ##
  # Calculate variance (Robertson et al., 2007, pg. 27)
  var.eta_conc <- (1/(eta_conc^2)) * ( var.b0 + 2*cov.b0.b1*eta_conc + var.b1*eta_conc^2 )
  xtxt = cbind(p_level,conc_est=eta_conc, LCL=LCL, UCL=UCL, variance=var.eta_conc, sd.eta_conc=sqrt(var.eta_conc))
  return (xtxt)
}

0.5用于EC50或ED50计算,就像你的(我猜) 人们根据具体情况使用EC90(0.9)或LD10(0.1)等。 我专门用于计算概率为0.95的LLOD(检测下限,分子检测分析)。此脚本输出与SPSS的Probit计算非常接近。

p_level = c(0.5, 0.9, 0.95, 0.99)    # predict at these probability levels     
get_estimate(m3, p_level)

Comparison of the two model's output to SPSS output

希望你现在自己想出SE。您已经进行了方差计算,我使用sqrt来获得标准偏差。研究了这个示例数据。