Question

下午好，我在使用NNET包执行逻辑回归时得到的输出有问题。我想用Category和HS_TR (Return Period)来预测SLR (Sea Level Rise)。名为fit的多项模型已使用x.sub子集中的信息进行计算。有4种不同的类别1,2,3或4。

x.sub：

   POINTID  HS_TR  SLR  Category
       4     10    0.0     3
       4     10    0.6     4
       4     50    0.0     3
       4     50    0.6     4
       4    100    0.0     4
       4    100    0.6     4

当我运行模型时＆gt; fit <- multinom(Category ~ HS_TR + SLR, x.sub, maxit=3000)我得到了结果：

Coefficients:
    (Intercept)       HS_TR         SLR 
    -30.5791517   0.4130478  62.0976951 

    Residual Deviance: 0.0001820405 
    AIC: 6.000182

现在我有了多项式，我想知道SLR和HS_TR的特定场景（d3）的预测类别。我定义d3并应用预测，我得到了合理的结果：

d3<-data.frame("HS_TR"=c(10),"SLR"=c(0))
prediction <-(predict(fit,d3))

我得到了

> prediction
[[1]]
[1] 3 
Level: 3

但是，当我计算得到预测prediction <-(predict(fit,d3, type="probs"))的概率时，我会得到以下结果：

> prediction
[[1]]
1 
0

这没有任何意义，因为它表示存在概率0.由于我运行的模型给出了CATEGORY的预测，我不明白为什么然后，概率是0.有人知道为什么我得到它？

如果有人知道如何处理问题，以便我可以解决它。提前谢谢。

Answer 1

你有分离/完全分离的问题（谷歌这个术语是为了获得更多信息。This page给出了一个很好的介绍，其中包含这个引用：

当结果变量完全分离预测变量或预测变量组合时，就会发生完全分离。

如果您查看数据，例如使用

<studentFile>
                        <student>
                            <studentName>CLASSA</studentName>
                            <studentStatus>Success</studentStatus>
                            <studentActions>
                                <studentAction>
                                    <studentType>Juniour</studentType>
                                    <studentStatus>Failed</studentStatus>
                                    <studentMsg/>
                                </studentAction>
                                <studentAction>
                                    <studentType>HighSchool</studentType>
                                    <studentStatus>Completed</studentStatus>
                                    <studentMsg/>
                                </studentAction>
                            </studentActions>
                        </student>
                        <student>
                            <studentName>CLASSB</studentName>
                            <studentStatus>Success</studentStatus>
                            <studentActions>
                                <studentAction>
                                    <studentType>Senior</studentType>
                                    <studentStatus>Completed</studentStatus>
                                </studentAction>
                                <studentAction>
                                    <studentType>Middle</studentType>
                                    <studentStatus>Completed</studentStatus>
                                </studentAction>                         
                            </studentActions>
</studentFile>

然后您会发现> xtabs(~ Category + HS_TR + SLR, data=x.sub) , , SLR = 0 HS_TR Category 10 50 100 3 1 1 0 4 0 0 1 , , SLR = 0.6 HS_TR Category 10 50 100 3 0 0 0 4 1 1 1和SLR的组合完全决定了HS_TR的结果。您需要指定更简单的模型或获取更多数据以提供稳定的拟合。

在您的情况下，您的输出只有两个可能的类别，因此您应该能够拟合对数线性模型或逻辑回归模型并获得相同的结果。如果您创建的新变量SLR=0.6是Cat的因子，那么您会看到一个警告，指示您正确的方向。

Category

我认为> glm(Cat ~HS_TR + SLR, data=x.sub, family="binomial") Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred没有检测到数据中的问题。但是，如果您查看拟合的multinom，则会发现两个参数估计值的标准误差非常大。这也表明估计数不稳定，分离可能是一个问题。

summary

我认为> summary(fit) Call: multinom(formula = Category ~ HS_TR + SLR, data = x.sub, maxit = 3000) Coefficients: Values Std. Err. (Intercept) -30.5791517 356.932851 HS_TR 0.4130478 5.137396 SLR 62.0976951 634.584184 Residual Deviance: 0.0001820405 AIC: 6.000182中的收敛检查缺乏某种检查。

概率来自Multinomial Regression nnet包

1 个答案: