我正在尝试从头开始制作有序的probit函数。
我有一些让我非常接近clm函数的结果的方法,但并非完全如此。
probit <- function(df, yvar) {
# include only complete cases
df <- df[complete.cases(df),]
# Make the data matrix
X <- df %>% select(-yvar)
# get variable names
names <- names(X)
# take out the y column and convert X to a matrix
y <- df[yvar]
X <- X %>% as.matrix()
# number of categories to be estimated
names(y) <- c("y")
M <- filter(y, !is.na(y)) %>% unique() %>% nrow()
# (neg) log likelihood function
negLL <- function(par, X, y) {
b <- par[1:ncol(X)]
t <- par[(ncol(X) + 1):(ncol(X) + M - 1)]
# Make y numeric
y <- as.numeric(y$y)
# Set the upper and lower categories to negative and positive infinity
t <- c(-Inf, t, Inf)
# Apply the normal CDF transformation to each observation's covariates,
# cycling through each threshold level with the indicator function.
mu <- matrix(NA, nrow = nrow(X), ncol = M)
for(i in 1:nrow(X)){
for(j in 2:(M+1)) { #R won't index starting at 0, so have to + 1
mu[i, j - 1] <- ifelse(y[i] == j - 1,
1 * (log(pnorm(t[j - 1] - X[i,] %*% b) -
pnorm(t[j - 2] - X[i,] %*% b))),
0)
}}
# Now compute the log likelihood by summing above
LL <- sum(mu, na.rm = T)
# Negative log likelihood
return(-LL)
}
# optimize
results <- optim(par = c(rep(0, ncol(X)), c(1:(M-1))), fn = negLL,
y = y, X = X, hessian = T)
parameters <- results$par[1:ncol(X)] %>% as.numeric()
names(parameters) <- names
thresholds <- results$par[(ncol(X)+1):(length(results$par))] %>%
as.numeric()
list(coefs = parameters, thresholds = thresholds,
varcovar = solve(results$hessian),
se = sqrt(diag(solve(results$hessian))),
deviance = 2*results$value,
converged = results$convergence == 0,
loglik = -results$value,
iterations = results$counts[[1]]) %>%
return()
}
我已经在本文上对此进行了测试:https://onlinelibrary.wiley.com/doi/full/10.1111/ajps.12290
使用此复制数据:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/Q8CSU8
例如,对于“诱惑”结果(解释变量=“ interviewl”),我应该得到0.47的系数,但是我得到0.38。谁能在我的代码中找到问题?谢谢!