probit ordinal logistic regression with `MASS::polr`: How to make prediction on new data

时间:2016-10-20 13:24:24

标签: r regression logistic-regression predict ordinal

I want to do ordinal regression in R, so I want to use the polr function from the MASS package. First I create a model like this:

model <- polr(labels ~ var1 + var2, Hess = TRUE)  

Now I want to use the model to predict new cases. I thought that would simply be:

pred <- predict(model, data = c(newVar1, newVar2))  

However it seems that predict is somehow predicting on the training set, not the new data. When my training set is 2000 examples, and my new data is 700 examples. I still get 2000 predicted labels.

So my question is: how do I use polr to make predictions on new data?

1 个答案:

答案 0 :(得分:5)

遗憾的是,predict.polr没有文档条目,否则您只需阅读有关如何正确使用predict的文档。

在R中,只有少数原始模型拟合函数,如smooth.splinepredict期望新数据的向量(这是合理的,因为smooth.spline处理单变量回归)。通常,predict需要一个数据框或列表,其名称与模型公式中指定的变量匹配,或者如模型框中所示(&#34; terms&#34;属性)。如果你适合模特:

labels ~ var1 + var2

然后你应该构建newdata

predict(model, newdata = data.frame(var1 = newVar1, var2 = newVar2))

predict(model, newdata = list(var1 = newVar1, var2 = newVar2))

注意,newdatadata,而非predict

由于没有文件,如果我们看一下可能会很好:

args(MASS:::predict.polr)
#function (object, newdata, type = c("class", "probs"), ...) 

你甚至可以检查源代码(不长):

MASS:::predict.polr

您将在源代码中看到:

newdata <- as.data.frame(newdata)
m <- model.frame(Terms, newdata, na.action = function(x) x, 
       xlev = object$xlevels)

这解释了为什么newdata应该作为数据框传递,以及为什么变量名称必须与Terms中的名称相匹配。

这是一个可重复的例子:

library(MASS)
house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)

## check model terms inside model frame
attr(terms(house.plr$model), "term.labels")
# [1] "Infl" "Type" "Cont"

进行预测时,这些不起作用:

## `data` ignored as no such argument
predict(house.plr, data = data.frame("Low", "Tower", "Low"))
## no_match in names 
predict(house.plr, newdata = data.frame("Low", "Tower", "Low"))

这有效:

predict(house.plr, newdata = data.frame(Infl = "Low", Type = "Tower", Cont = "Low"))

#[1] Low
#Levels: Low Medium High