R中的混合logit模型估计问题,始终与初始值相同

时间:2018-12-04 04:45:56

标签: r mixed-models

问题是,我有一个缺少值的宽数据集,并且响应变量为二进制,响应为是或否。我需要通过混合logit模型来估计它,因此我首先通过鼠标包估算数据。之后,将数据传输到mlogit.data类中,通过gmnl进行估算。 估算值的结果总是与初始值几乎相同。...这似乎是由于梯度接近0导致的,根本没有优化。

我使用作业中的另一个更简单的数据(所以我认为这不是数据的问题)来检查数据的运行情况。奇怪的是,我删除了其中的一些值,估算,估计,结果是否相同,这意味着估算中可能存在错误?但是使用不丢失值的原始数据集会得到相同的结果。在这种情况下,我只是读取数据,将其传输到mlogit类,然后通过R中的内置函数进行估算。这可能是什么问题?我现在完全感到困惑和绝望。

在这方面有人帮助我吗?感谢您的帮助和讨论。

简单数据如下: the missing one

代码是:

# use data having missing values
data1   <- read.table("train_Titanic.csv", sep = ",", header = T) 

data1$Pclass <- as.factor(data1$Pclass)
data1$Male <- as.logical(data1$Male)
data1$SibSp <- as.factor(data1$SibSp)
data1$Parch <- as.factor(data1$Parch)

part <- data1[,3:8]
imputed <- mice(part,m=7,maxit = 2)
part2 <- complete(imputed,1)
data_imputed <- cbind(data1[,1:2],part2)

data_imputed <- na.exclude(data_imputed) # drop the data without response
reg_missing <- mlogit.data(data_imputed,choice = "Survived",shape = "wide",id.var = "PassengerId")
model_missing <- gmnl(Survived ~ Pclass+Male+SibSp+Parch+Age+Fare
              | 1,
              data = reg_missing,
              model = "mixl", 
              reflevel = 2, 
              haltons = NA, 
              R = 150, 
              panel = TRUE,
              print.init = TRUE, 
              print.level = 2,
              ranp = c(Age = "n"),
              correlation = FALSE,
              iterlim = 500,
              method = "bhhh",)

# use the original data having no missing values
data2 <- read.table("train_Titanic(1).csv", sep = ",", header = T) # Load whole data
reg_nomissing <- mlogit.data(data2,choice = "Survived",shape = "wide",id.var = "PassengerId")
model_nomissing <- gmnl(Survived ~ Pclass+Male+SibSp+Parch+Age+Fare
                    | 1,
                    data = reg_nomissing,
                    model = "mixl", 
                    reflevel = 2, 
                    haltons = NA, 
                    R = 150, 
                    panel = TRUE,
                    print.init = TRUE, 
                    print.level = 2,
                    ranp = c(Age = "n"),
                    correlation = FALSE,
                    iterlim = 500,
                    method = "bhhh")

model_nomissing的输出为(第一个类似):

Starting Values:
0:(intercept)        Pclass          Male         SibSp         Parch          Fare           Age        sd.Age 
   0.37776251   -0.03504190   -0.01583640   -0.01261044   -0.01048478   -0.74403561   -0.49059713    0.10000000 
Estimating MIXL model 
----- Initial parameters: -----
fcn value: -405.3734 
                parameter initial gradient free
0:(intercept)  0.37776251     4.710893e-11    1
Pclass        -0.03504190     2.170486e-14    1
Male          -0.01583640     4.385381e-15    1
SibSp         -0.01261044     5.162537e-15    1
Parch         -0.01048478     5.884182e-15    1
Fare          -0.74403561     5.500045e-13    1
Age           -0.49059713     1.878497e-13    1
sd.Age         0.10000000    -6.589781e-16    1
Condition number of the (active) hessian: 2.489843e+32 
-----Iteration 1 -----
--------------
gradient close to zero 
1  iterations
estimate: 0.3777625 -0.0350419 -0.0158364 -0.01261044 -0.01048478 -0.7440356 -0.4905971 0.1 
Function value: -405.3734 

感谢您阅读。

1 个答案:

答案 0 :(得分:0)

我只是发现我做错了什么,我应该从宽格式创建宽面板数据,使A1之类的属性具有A1.1,A1.2。

但是问题仍然存在,最终值与首字母相同。