问题是,我有一个缺少值的宽数据集,并且响应变量为二进制,响应为是或否。我需要通过混合logit模型来估计它,因此我首先通过鼠标包估算数据。之后,将数据传输到mlogit.data类中,通过gmnl进行估算。 估算值的结果总是与初始值几乎相同。...这似乎是由于梯度接近0导致的,根本没有优化。
我使用作业中的另一个更简单的数据(所以我认为这不是数据的问题)来检查数据的运行情况。奇怪的是,我删除了其中的一些值,估算,估计,结果是否相同,这意味着估算中可能存在错误?但是使用不丢失值的原始数据集会得到相同的结果。在这种情况下,我只是读取数据,将其传输到mlogit类,然后通过R中的内置函数进行估算。这可能是什么问题?我现在完全感到困惑和绝望。
在这方面有人帮助我吗?感谢您的帮助和讨论。
简单数据如下: the missing one
代码是:
# use data having missing values
data1 <- read.table("train_Titanic.csv", sep = ",", header = T)
data1$Pclass <- as.factor(data1$Pclass)
data1$Male <- as.logical(data1$Male)
data1$SibSp <- as.factor(data1$SibSp)
data1$Parch <- as.factor(data1$Parch)
part <- data1[,3:8]
imputed <- mice(part,m=7,maxit = 2)
part2 <- complete(imputed,1)
data_imputed <- cbind(data1[,1:2],part2)
data_imputed <- na.exclude(data_imputed) # drop the data without response
reg_missing <- mlogit.data(data_imputed,choice = "Survived",shape = "wide",id.var = "PassengerId")
model_missing <- gmnl(Survived ~ Pclass+Male+SibSp+Parch+Age+Fare
| 1,
data = reg_missing,
model = "mixl",
reflevel = 2,
haltons = NA,
R = 150,
panel = TRUE,
print.init = TRUE,
print.level = 2,
ranp = c(Age = "n"),
correlation = FALSE,
iterlim = 500,
method = "bhhh",)
# use the original data having no missing values
data2 <- read.table("train_Titanic(1).csv", sep = ",", header = T) # Load whole data
reg_nomissing <- mlogit.data(data2,choice = "Survived",shape = "wide",id.var = "PassengerId")
model_nomissing <- gmnl(Survived ~ Pclass+Male+SibSp+Parch+Age+Fare
| 1,
data = reg_nomissing,
model = "mixl",
reflevel = 2,
haltons = NA,
R = 150,
panel = TRUE,
print.init = TRUE,
print.level = 2,
ranp = c(Age = "n"),
correlation = FALSE,
iterlim = 500,
method = "bhhh")
model_nomissing的输出为(第一个类似):
Starting Values:
0:(intercept) Pclass Male SibSp Parch Fare Age sd.Age
0.37776251 -0.03504190 -0.01583640 -0.01261044 -0.01048478 -0.74403561 -0.49059713 0.10000000
Estimating MIXL model
----- Initial parameters: -----
fcn value: -405.3734
parameter initial gradient free
0:(intercept) 0.37776251 4.710893e-11 1
Pclass -0.03504190 2.170486e-14 1
Male -0.01583640 4.385381e-15 1
SibSp -0.01261044 5.162537e-15 1
Parch -0.01048478 5.884182e-15 1
Fare -0.74403561 5.500045e-13 1
Age -0.49059713 1.878497e-13 1
sd.Age 0.10000000 -6.589781e-16 1
Condition number of the (active) hessian: 2.489843e+32
-----Iteration 1 -----
--------------
gradient close to zero
1 iterations
estimate: 0.3777625 -0.0350419 -0.0158364 -0.01261044 -0.01048478 -0.7440356 -0.4905971 0.1
Function value: -405.3734
感谢您阅读。
答案 0 :(得分:0)
我只是发现我做错了什么,我应该从宽格式创建宽面板数据,使A1之类的属性具有A1.1,A1.2。
但是问题仍然存在,最终值与首字母相同。