R中的自制订购的Probit

时间:2019-05-27 16:20:35

标签: r mle

我正在尝试从头开始制作有序的probit函数。

我有一些让我非常接近clm函数的结果的方法,但并非完全如此。

probit <- function(df, yvar) {
  # include only complete cases
  df <- df[complete.cases(df),]

  # Make the data matrix
  X <- df %>% select(-yvar)

  # get variable names
  names <- names(X)

  # take out the y column and convert X to a matrix
  y <- df[yvar]
  X <- X %>% as.matrix()

  # number of categories to be estimated
  names(y) <- c("y")
  M <- filter(y, !is.na(y)) %>% unique() %>% nrow()

  # (neg) log likelihood function
  negLL <- function(par, X, y) {
    b <- par[1:ncol(X)]
    t <- par[(ncol(X) + 1):(ncol(X) + M - 1)]

    # Make y numeric
    y <- as.numeric(y$y)

    # Set the upper and lower categories to negative and positive infinity
    t <- c(-Inf, t, Inf)

    # Apply the normal CDF transformation to each observation's covariates,
    # cycling through each threshold level with the indicator function.
    mu <- matrix(NA, nrow = nrow(X), ncol = M)
    for(i in 1:nrow(X)){
      for(j in 2:(M+1)) { #R won't index starting at 0, so have to + 1
        mu[i, j - 1] <-  ifelse(y[i] == j - 1,
            1 *  (log(pnorm(t[j - 1] - X[i,] %*% b) -
                        pnorm(t[j - 2] - X[i,] %*% b))),
            0)
        }}

    # Now compute the log likelihood by summing above
    LL <- sum(mu, na.rm = T)

    # Negative log likelihood
    return(-LL)
  }

  # optimize
  results <- optim(par = c(rep(0, ncol(X)), c(1:(M-1))), fn = negLL, 
                   y = y, X = X, hessian = T)
  parameters <- results$par[1:ncol(X)] %>% as.numeric()
  names(parameters) <- names
  thresholds <- results$par[(ncol(X)+1):(length(results$par))] %>% 
    as.numeric()
  list(coefs = parameters, thresholds = thresholds,
       varcovar = solve(results$hessian),
       se = sqrt(diag(solve(results$hessian))),
       deviance = 2*results$value,
       converged = results$convergence == 0,
       loglik = -results$value,
       iterations = results$counts[[1]]) %>% 
    return()
}

我已经在本文上对此进行了测试:https://onlinelibrary.wiley.com/doi/full/10.1111/ajps.12290

使用此复制数据:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/Q8CSU8

例如,对于“诱惑”结果(解释变量=“ interviewl”),我应该得到0.47的系数,但是我得到0.38。谁能在我的代码中找到问题?谢谢!

0 个答案:

没有答案