给出估计参数的对数似然计算

时间:2018-04-09 22:19:46

标签: r lme4 mixed-models model-fitting log-likelihood

一般情况下:我想根据数据 O 中的估计模型参数来计算数据 N 的(log)似然性。

更具体地说,我想知道下面的ll_given_modPars函数是否存在于处理数据建模的{R}包中(lme4glmm等),如下所示这个抽象的例子(没有运行):

library(lme4)
o_model <- lmer(observed ~ fixed.id + (1|random.id), data = O, REML = F)
n_logLik <- ll_given_modPars(model.estimates = o_model, data = N)

为简单起见,上面的虚构示例是在线性混合模型上但我想最终在广义线性混合模型中进行此操作,该模型处理Poisson族或直接负二项式(对于lme4:{{ 1}}或glmer(..., family="poisson"))。

从我所看到的,大多数软件包处理参数估计(很好,我需要)但是然后使用glmer.nb或者某种程度比较相同数据的模型和固定和随机效果的不同组合不是我想做的事。

我想要在不同数据上使用相同参数的对数似然性。

主要尝试:

  1. 在找不到似乎正在做的功能后,我想“简单地”将anova代码调整到我的目的:它计算给定数据的参数的对数似然,所以我想我可以使用相同的框架,但没有优化不同的参数,但隔离似然计算功能,只给它参数和数据。不幸的是,代码有点高于我目前的技能https://github.com/lme4/lme4/blob/master/R/nbinom.R(我对他们如何使用优化的对象感到有点迷失。)

  2. 我想过自己做一个可能性计算,从一个线性混合模型开始,然后逐步进行更多涉及的模型。但是已经有了这个example我很难跟上数学,即使按照规定使用公式,获得的对数似然仍然不同(我不知道为什么,请参阅附录中的代码)和我我担心在我能够为更多参与模型(例如泊松或负二项式)做这件事之前需要花费太长时间

  3. 此时我不确定最适合追求的途径,并感谢您提供的任何意见。

    附录:尝试根据How does lmer (from the R package lme4) compute log likelihood?计算对数似然(或找到闭合形式近似)。 lme4(来自lmer)给出-17.8的对数似然,得到-45.56

    lme4

    编辑:评论中提供了一个有用的链接(https://stats.stackexchange.com/questions/271903/understand-marginal-likelihood-of-mixed-effects-models)。从上面转换我的例子我可以复制log-likelihood

    library(lme4)
    set.seed(7)
    n <- 2  # number of groups
    m <- 4  # number of instances per group
    fixed.effect <- c(0, -2, -1, 1) 
    tau <- 5 # standard deviation of random effects
    sigma <- 2 # standard deviation of error
    random.effect <- rnorm(n, mean=0, sd=tau)
    sim.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
                         GROUP.EFFECT=rep(random.effect, each=m),
                         INSTANCE.ID=as.factor(rep(1:m, times=n)),
                         INSTANCE.EFFECT=rep(fixed.effect, times=n))
    # calculate expected Y value
    sim.data$EXPECT.Y <- sim.data$GROUP.EFFECT + sim.data$INSTANCE.EFFECT
    # now observe Y value, assuming normally distributed with fixed std. deviation
    sim.data$OBS.Y <- rnorm(nrow(sim.data), mean=sim.data$EXPECT.Y, sigma)
    
    
    model <- lmer(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = sim.data, REML=F)
    summary(model)
    
    toy.model.var <- VarCorr(model)
    toy.model.sigma <- attr(toy.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
    toy.model.tau.squared <- toy.model.var[[1]][1] # corresponds to variance of random effects
    toy.model.betas <- model@beta
    
    # left product, spread within gropus
    toy.data <- rbind(sim.data$OBS.Y[1:4], sim.data$OBS.Y[5:8])
    toy.mean.adj <- rbind(toy.data[1,] - mean(unlist(toy.data[1,])), toy.data[2,] - mean(unlist(toy.data[2,])))
    toy.mean.adj.prod1 <- prod(dnorm(unlist(toy.mean.adj[1,]), mean = 0, sd = toy.model.sigma))
    toy.mean.adj.prod2 <- prod(dnorm(unlist(toy.mean.adj[2,]), mean = 0, sd = toy.model.sigma))
    toy.mean.adj.final.prod <- toy.mean.adj.prod1 * toy.mean.adj.prod2
    
    # right product, spread between gropus
    toy.mean.beta.adj <- rbind(mean(unlist(toy.data[1,])) - toy.model.betas, mean(unlist(toy.data[2,])) - toy.model.betas)
    toy.mean.beta.adj[1,] <- toy.mean.beta.adj[1,] - c(0, toy.model.betas[1], toy.model.betas[1], toy.model.betas[1])
    toy.mean.beta.adj[2,] <- toy.mean.beta.adj[2,] - c(0, toy.model.betas[1], toy.model.betas[1], toy.model.betas[1])
    toy.mean.beta.adj.prod1 <- prod(dnorm(unlist(toy.mean.beta.adj[1,]), mean = 0, sd = sqrt(toy.model.sigma^2/4 + toy.model.tau.squared)) * sqrt(2/4*pi*toy.model.sigma^2))
    toy.mean.beta.adj.prod2 <- prod(dnorm(unlist(toy.mean.beta.adj[2,]), mean = 0, sd = sqrt(toy.model.sigma^2/4 + toy.model.tau.squared)) * sqrt(2/4*pi*toy.model.sigma^2))
    
    toy.mean.beta.adj.final.prod <- toy.mean.beta.adj.prod1 * toy.mean.beta.adj.prod2
    
    toy.total.prod <- toy.mean.adj.final.prod * toy.mean.beta.adj.final.prod
    log(toy.total.prod)
    

1 个答案:

答案 0 :(得分:0)

虽然我没有设法为所有这些解决方案提供封闭式解决方案,但我确实设法使用数字集成来重现对数似然。我在下面的小例子中发布了它在LMM设置中的工作原理(假设正常残差随机效应)以及带泊松和负二项式的GLMM。请注意,特别是后者在增加样本量时往往会略有不同。我的猜测是在某处发生了一些舍入,但就我的目的而言,这里达到的精度已经足够了。我现在接受我自己的回答,但如果有人发布Poisson或Negative-Binomial的封闭表格,我会很乐意接受你的答案:)

library(lme4)
library(mvtnorm)

################################################################################
# LMM numerical integration

set.seed(7)

n <- 2  # number of groups
m <- 4  # number of instances per group
fixed.effect <- c(0, -2, -1, 1)
tau <- 5 # standard deviation of random effects
sigma <- 2 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau)
normal.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
                       GROUP.EFFECT=rep(random.effect, each=m),
                       INSTANCE.ID=as.factor(rep(1:m, times=n)),
                       INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
normal.data$EXPECT.Y <- normal.data$GROUP.EFFECT + normal.data$INSTANCE.EFFECT
# now observe Y value, assuming normally distributed with fixed std. deviation
normal.data$OBS.Y <- rnorm(nrow(normal.data), mean=normal.data$EXPECT.Y, sigma)


normal.model <- lmer(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = normal.data, REML=F)
summary(normal.model)

normal.model.var <- VarCorr(normal.model)
normal.model.sigma <- attr(normal.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
normal.model.tau.squared <- normal.model.var[[1]][1] # corresponds to variance of random effects
normal.model.betas <- normal.model@beta

normal.group.tau <- sqrt(normal.model.tau.squared)
normal.group.sigma <- sigma(normal.model)
normal.group.beta <- predict(normal.model, re.form=NA)[1:4]

integrate_group1 <- function(x){
  p1 <- dnorm(normal.data$OBS.Y[1] - normal.group.beta[1] - x, mean = 0, sd = normal.group.sigma) * dnorm(x, mean = 0, sd = normal.group.tau)
  p2 <- dnorm(normal.data$OBS.Y[2] - normal.group.beta[2] - x, mean = 0, sd = normal.group.sigma)
  p3 <- dnorm(normal.data$OBS.Y[3] - normal.group.beta[3] - x, mean = 0, sd = normal.group.sigma)
  p4 <- dnorm(normal.data$OBS.Y[4] - normal.group.beta[4] - x, mean = 0, sd = normal.group.sigma)

  p_out <- p1 * p2 * p3 * p4
  p_out
}

normal.group1.integration <- integrate(integrate_group1, lower = -10*normal.group.tau, upper = 10*normal.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]

integrate_group2 <- function(x){
  p1 <- dnorm(normal.data$OBS.Y[5] - normal.group.beta[1] - x, mean = 0, sd = normal.group.sigma) * dnorm(x, mean = 0, sd = normal.group.tau)
  p2 <- dnorm(normal.data$OBS.Y[6] - normal.group.beta[2] - x, mean = 0, sd = normal.group.sigma)
  p3 <- dnorm(normal.data$OBS.Y[7] - normal.group.beta[3] - x, mean = 0, sd = normal.group.sigma)
  p4 <- dnorm(normal.data$OBS.Y[8] - normal.group.beta[4] - x, mean = 0, sd = normal.group.sigma)

  p_out <- p1 * p2 * p3 * p4
  p_out
}

normal.group2.integration <- integrate(integrate_group2, lower = -10*normal.group.tau, upper = 10*normal.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]

log(normal.group1.integration) + log(normal.group2.integration)


#################################
# Poisson numerical integration
set.seed(13) #13
n <- 2  # number of groups
m <- 4  # number of instances per group
# effect sizes are much smaller since they are exponentiated
fixed.effect <- c(0, -0.2, -0.1, 0.2)
tau <- 1.5 # standard deviation of random effects
# sigma <- 1.5 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau)  # guide effect
poisson.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
                       GROUP.EFFECT=rep(random.effect, each=m),
                       INSTANCE.ID=as.factor(rep(1:m, times=n)),
                       INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
poisson.data$EXPECT.Y <- exp(poisson.data$GROUP.EFFECT + poisson.data$INSTANCE.EFFECT)
# now observe Y value, assuming normally distributed with fixed std. deviation
poisson.data$OBS.Y <- rpois(nrow(poisson.data), poisson.data$EXPECT.Y)

poisson.model <- glmer(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = poisson.data, family="poisson")
summary(poisson.model)

poisson.model.var <- VarCorr(poisson.model)
poisson.model.sigma <- attr(poisson.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
poisson.model.tau.squared <- poisson.model.var[[1]][1] # corresponds to variance of random effects
poisson.model.betas <- poisson.model@beta

poisson.group.tau <- sqrt(poisson.model.tau.squared)
poisson.group.sigma <- sigma(poisson.model)
poisson.group.beta <- predict(poisson.model, re.form=NA)[1:4]

integrate_group1 <- function(x){
  p1 <- dpois(poisson.data$OBS.Y[1], lambda = exp(poisson.group.beta[1] + x)) * dnorm(x, mean = 0, sd = poisson.group.tau)
  p2 <- dpois(poisson.data$OBS.Y[2], lambda = exp(poisson.group.beta[2] + x))
  p3 <- dpois(poisson.data$OBS.Y[3], lambda = exp(poisson.group.beta[3] + x))
  p4 <- dpois(poisson.data$OBS.Y[4], lambda = exp(poisson.group.beta[4] + x))

  p_out <- p1 * p2 * p3 * p4
  p_out
}

poisson.group1.integration <- integrate(integrate_group1, lower = -10*poisson.group.tau, upper = 10*poisson.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]

integrate_group2 <- function(x){
  p1 <- dpois(poisson.data$OBS.Y[5], lambda = exp(poisson.group.beta[1] + x)) * dnorm(x, mean = 0, sd = poisson.group.tau)
  p2 <- dpois(poisson.data$OBS.Y[6], lambda = exp(poisson.group.beta[2] + x))
  p3 <- dpois(poisson.data$OBS.Y[7], lambda = exp(poisson.group.beta[3] + x))
  p4 <- dpois(poisson.data$OBS.Y[8], lambda = exp(poisson.group.beta[4] + x))

  p_out <- p1 * p2 * p3 * p4
  p_out
}

poisson.group2.integration <- integrate(integrate_group2, lower = -10*poisson.group.tau, upper = 10*poisson.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]

log(poisson.group1.integration) + log(poisson.group2.integration)



#############
# Negative-Binomial numerical integration

set.seed(13) #13
n <- 100  # number of groups
m <- 4  # number of instances per group
# effect sizes are much smaller since they are exponentiated
fixed.effect <- c(0, -0.2, -0.1, 0.2)
tau <- 1.5 # standard deviation of random effects
theta <- 0.5
# sigma <- 1.5 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau)  # guide effect
nb.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
                       GROUP.EFFECT=rep(random.effect, each=m),
                       INSTANCE.ID=as.factor(rep(1:m, times=n)),
                       INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
nb.data$EXPECT.Y <- exp(nb.data$GROUP.EFFECT + nb.data$INSTANCE.EFFECT)
# now observe Y value, assuming normally distributed with fixed std. deviation
nb.data$OBS.Y <- rnbinom(nrow(nb.data), mu = nb.data$EXPECT.Y, size = theta)

nb.model <- glmer.nb(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = nb.data)
summary(nb.model)

nb.model.var <- VarCorr(nb.model)
nb.model.sigma <- attr(nb.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
nb.model.tau.squared <- nb.model.var[[1]][1] # corresponds to variance of random effects
nb.model.betas <- nb.model@beta

nb.group.tau <- sqrt(nb.model.tau.squared)
nb.group.beta <- predict(nb.model, re.form=NA)[1:4]
nb.group.dispersion <- getME(nb.model, "glmer.nb.theta")

integration_function_generator <- function(input.obs, input.beta, input.dispersion, input.tau){
  function(x){
    p1 <- dnbinom(input.obs[1], mu = exp(input.beta[1] + x), size = input.dispersion) * dnorm(x, mean = 0, sd = input.tau)
    p2 <- dnbinom(input.obs[2], mu = exp(input.beta[2] + x), size = input.dispersion)
    p3 <- dnbinom(input.obs[3], mu = exp(input.beta[3] + x), size = input.dispersion)
    p4 <- dnbinom(input.obs[4], mu = exp(input.beta[4] + x), size = input.dispersion)

    p_out <- p1 * p2 * p3 * p4
    p_out
  }
}

nb.all.group.integrations <- c()
for(i in 1:n){
  temp.obs <- nb.data$OBS.Y[(1:4)+(i-1)*4]
  temp_integrate_function <- integration_function_generator(temp.obs, nb.group.beta, nb.group.dispersion, nb.group.tau)
  temp.integration <- integrate(temp_integrate_function, lower = -10*nb.group.tau, upper = 10*nb.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]
  nb.all.group.integrations <- c(nb.all.group.integrations, temp.integration)
}
sum(log(nb.all.group.integrations))