假设我有2个data.frame
个对象:
df1 <- data.frame(x = 1:100)
df1$y <- 20 + 0.3 * df1$x + rnorm(100)
df2 <- data.frame(x = 1:200000)
df2$y <- 20 + 0.3 * df2$x + rnorm(200000)
我想做MLE。使用df1
一切正常:
LL1 <- function(a, b, mu, sigma) {
R = dnorm(df1$y - a- b * df1$x, mu, sigma)
-sum(log(R))
}
library(stats4)
mle1 <- mle(LL1, start = list(a = 20, b = 0.3, sigma=0.5),
fixed = list(mu = 0))
> mle1
Call:
mle(minuslogl = LL1, start = list(a = 20, b = 0.3, sigma = 0.5),
fixed = list(mu = 0))
Coefficients:
a b mu sigma
23.89704180 0.07408898 0.00000000 3.91681382
但如果我用df2
执行相同的任务,我会收到错误:
LL2 <- function(a, b, mu, sigma) {
R = dnorm(df2$y - a- b * df2$x, mu, sigma)
-sum(log(R))
}
mle2 <- mle(LL2, start = list(a = 20, b = 0.3, sigma=0.5),
fixed = list(mu = 0))
Error in optim(start, f, method = method, hessian = TRUE, ...) :
initial value in 'vmmin' is not finite
我怎样才能克服它?
答案 0 :(得分:5)
R
的值在某个时刻变为零;它导致函数的非有限值被最小化并返回错误。
使用参数log=TRUE
可以更好地处理此问题,请参阅下面的函数LL3
。以下给出了一些警告,但返回了一个结果,参数估计值接近真实参数。
require(stats4)
set.seed(123)
e <- rnorm(200000)
x <- 1:200000
df3 <- data.frame(x)
df3$y <- 20 + 0.3 * df3$x + e
LL3 <- function(a, b, mu, sigma) {
-sum(dnorm(df3$y - a- b * df3$x, mu, sigma, log=TRUE))
}
mle3 <- mle(LL3, start = list(a = 20, b = 0.3, sigma=0.5),
fixed = list(mu = 0))
Warning messages:
1: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
2: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
3: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
4: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
5: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
6: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
7: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
8: In dnorm(df3$y - a - b * df3$x, mu, sigma, log = TRUE) : NaNs produced
> mle3
Call:
mle(minuslogl = LL3, start = list(a = 20, b = 0.3, sigma = 0.5),
fixed = list(mu = 0))
Coefficients:
a b mu sigma
19.999166 0.300000 0.000000 1.001803
答案 1 :(得分:5)
当最小化对数似然函数时,我遇到了同样的问题。经过一些调试后,我发现问题出在我的起始值上。它们导致一个特定矩阵具有行列式= 0,这在记录日志时引起错误。因此,它找不到任何“有限”值,但这是因为函数将错误返回到优化。
底线:考虑使用起始值运行它时函数是否没有返回错误。
PS:Marius Hofert是完全正确的。永远不要压制警告。答案 2 :(得分:0)
R中的已知错误,bugzilla ID为17703。非常难以复制。