最大化包含pbivnorm的似然函数

时间:2016-03-10 07:36:24

标签: r maximization

我有一个包含双变量正常CDF的似然函数。即使真值为零,我仍然会将相关值保持在接近1的值。

R package sampleSelection最大化包含bivaraite正常CDF的似然函数(如Van de Ven和Van Praag(1981))。我试着查看包的源代码,但无法找到他们如何编写可能性。作为参考,Van de Ven和Van Praag的论文:

The Demand for Deductibles in Private Health Insurance: A Probit Model with Sample Selection.

似然函数是方程(19),其中H表示标准正常CDF,H_2表示双变量正常CDF。

我的问题:

  1. 有人能告诉我如何在sampleSelection包中写入似然函数吗?或

  2. 有人可以告诉我为什么我在下面的代码中得到接近1的值以进行相关?

  3. 这是让我夜不能寐的代码:

    ########################################################
    #
    #  Trying to code Van de Ven and Van Praag (1981)
    #
    #
    ########################################################
    library(MASS)
    library(pbivnorm)
    library(mnormt)
    library(maxLik)
    library(sampleSelection)
    set.seed(1)
    
    # Sample size
    full_sample <- 1000
    
    # Parameters
    rho      <- .1
    beta0    <- 0
    beta1    <- 1
    gamma0   <- .2
    gamma1   <- .5
    gamma2   <- .5
    varcovar <- matrix(c(1,rho,rho,1), nrow = 2, ncol = 2) 
    # Vectors for storing values
       y <- rep(0,full_sample)
       s <- rep(0,full_sample)
    outcome  <- rep(0,full_sample)
    select   <- rep(0,full_sample)
    
    #######################
    # Simulate data
    #######################
     x <- rnorm(full_sample)
     z <- rnorm(full_sample)
    
     errors <- mvrnorm(full_sample, rep(0,2), varcovar)
    # note: 1st element for selection eq; 2nd outcome
     s <- gamma0 + gamma1*x + gamma2*z + errors[,1]
     y <- beta0 + beta1*x + errors[,2]
    
     for(i in 1:full_sample){
        if(s[i]>=0){
           select[i] <- 1
           if(y[i]>=0){
             outcome[i] <- 1
           }else{
             outcome[i] <- 0
            }
        }else{
           outcome[i] <- NA
           select[i] <- 0
        }
    
    }
    #######################################
    # Writing the log likelihood
    ##
    # Note: vega1= beta0,
    #       vega2= beta1,
    #       vega3= gamma0,
    #       vega4= gamma1,
    #       vega5= gamma2,
    #       vega6= rho
    #######################################
    first.lf <- function(vega) {
    
    # Transforming this parameter becuase
    #   correlation is bounded between -1 aad 1
      corr <- tanh(vega[6])
    
    # Set up vectors for writing log likelihood
       y0 <- 1-outcome
       for(i in 1:full_sample) {    
         if(is.na(y0[i])){ y0[i]<- 0}
         if(is.na(outcome[i])){ outcome[i]<- 0}
       }
       yt0 <- t(y0)
       yt1 <- t(outcome)
       missing <- 1 - select
       ytmiss <- t(missing)
    
    # Terms in the selection and outcome equations
       A <- vega[3]+vega[4]*x+vega[5]*z
       B <- vega[1]+vega[2]*x
       term1 <- pbivnorm(A,B,corr)
       term0 <- pbivnorm(A,-B,corr)
       term_miss <- pnorm(-A)
       log_term1 <- log(term1)
       log_term0 <- log(term0)
       log_term_miss <- log(term_miss)
    # The log likelihood
        logl <- sum( yt1%*%log_term1 + yt0%*%log_term0 + ytmiss%*%log_term_miss)
    return(logl)
    }
    
    startv <- c(beta0,beta1,gamma0,gamma1,gamma2,rho)
    # Maxmimizing my likelihood gives
    maxLik(logLik = first.lf, start = startv, method="BFGS")
    # tanh(7.28604) = 0.9999991, far from the true value of .1
    
    # Using sampleSelection package for comparison
    outcomeF<-factor(outcome)
    selectEq <- select ~ x + z
    outcomeEq <- outcomeF ~ x
    selection( selectEq, outcomeEq)
    # Notice the value of -0.2162 for rho compared to my 0.9999991
    

1 个答案:

答案 0 :(得分:3)

在等式(19)中,纸张中存在拼写错误。从i = N_1 + 1到N的项应该是-rho而不是rho。因此,使用

term0 <- pbivnorm(A,-B,-corr)

给出

maxLik(logLik = first.lf, start = startv, method="BFGS")
# Maximum Likelihood estimation
# BFGS maximization, 40 iterations
# Return code 0: successful convergence 
# Log-Likelihood: -832.5119 (6 free parameter(s))
# Estimate(s): 0.3723783 0.9307454 0.1349979 0.4693686 0.4572421 -0.219618 

根据需要。