Question

我正在尝试编写自己的函数来理解泊松分布在最大似然估计框架中的行为（因为它适用于GLM）。

我熟悉R的方便glm函数，但想尝试手动滚动一些代码以了解正在发生的事情：

n <- 10000 # sample size
b0 <- 1.0 # intercept
b1 <- 0.2 # coefficient
x <- runif(n=n, min=0, max=1.5) # generate covariate values
lp <- b0+b1*x # linear predictor
lambda <- exp(lp) # compute lamda
y <- rpois(n=n, lambda=lambda) # generate y-values
dta <- data.frame(y=y, x=x) # generate dataset
negloglike <- function(lambda) {n*lambda-sum(x)*log(lambda) + sum(log(factorial(y)))} # build negative log-likelihood
starting.vals <- c(0,0) # one starting value for each parameter
pars <- c(b0, b1)
maxLike <- optim(par=pars,fn=negloglike, data = dta) # optimize

我输入maxLike时的R输出如下：

Error in fn(par, ...) : unused argument (data = list(y = c(2, 4....

我认为我在我的函数中错误地指定了optim，但是我对MLE的强项或约束优化并不熟悉，无法理解我所缺少的内容。

Answer 1

optim只能以某种方式使用你的函数。它假定函数中的第一个参数将参数作为向量。如果您需要将其他信息传递给此函数（在您的情况下是数据），您需要将其作为函数的参数。您的negloglike函数没有data参数，而且它的抱怨是什么。你编写代码的方式不需要，所以你可以通过删除调用的data = dat部分来解决你的问题，但我没有测试。这是一个简单的MLE只用于泊松（而不是glm）

的小例子

negloglike_pois <- function(par, data){
  x <- data$x
  lambda <- par[1]

  -sum(dpois(x, lambda, log = TRUE))
}

dat <- data.frame(x = rpois(30, 5))
optim(par = 4, fn = negloglike_pois, data = dat)
mean(dat$x)

> optim(par = 4, fn = negloglike_pois, data = dat)
$par
[1] 4.833594

$value
[1] 65.7394

$counts
function gradient 
      22       NA 

$convergence
[1] 0

$message
NULL

Warning message:
In optim(par = 4, fn = negloglike_pois, data = dat) :
  one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly
> # The "true" MLE. We didn't hit it exactly but came really close
> mean(dat$x)
[1] 4.833333

Answer 2

实施Dason回答中的评论非常简单，但以防万一：

library("data.table")

d <- data.table(id = as.character(1:100), 
                x1 = runif(100, 0, 1),
                x2 = runif(100, 0, 1))

#' the assumption is that lambda can be written as
#' log(lambda) = b1*x1 + b2*x2 
#' (In addition, could add a random component)
d[, mean := exp( 1.57*x1 + 5.86*x2 )]
#' draw a y for each of the observations
#' (rpois is not vectorized, need to use sapply)
d[, y := sapply(mean, function(x)rpois(1,x)) ]

negloglike_pois <- function(par, data){
  data <- copy(d)
  # update estimate of the mean
  data[, mean_tmp := exp( par[1]*x1 + par[2]*x2 )]
  # calculate the contribution of each observation to the likelihood
  data[, log_p := dpois(y, mean_tmp, log = T)]
  #' Now we can sum up the probabilities
  data[, -sum(log_p)]
}

optim(par = c(1,1), fn = negloglike_pois, data = d)
$par
[1] 1.554759 5.872219

$value
[1] 317.8094

$counts
function gradient 
      95       NA 

$convergence
[1] 0

$message
NULL

Poisson MLE的手动R代码

2 个答案: