Question

我正在试图弄清楚如何正确使用predict（）。目前，当我使用predict（）时，我收到消息“'newdata'有500行，但找到的变量有250行”。我写这个例子是为了强调我缺乏理解：

>mat1 <- matrix(rbinom(1000, 1, 0.5), ncol = 4)
>seq1 <- rbinom(250, 1, 0.5)
>model1 <- glm(seq1 ~ mat1, family=binomial)

>mat2 <- matrix(rbinom(2000, 1, 0.5), ncol = 4)
>df1 <- data.frame(mat2)

此时，我知道数据帧（df1）需要与模型中找到的标签相同的标签，所以我这样做：

>model1
Call:  glm(formula = seq1 ~ mat1, family = binomial)

Coefficients:
(Intercept)        mat11        mat12        mat13        mat14  
   -0.36483      0.21621      0.50607     -0.10879      0.02709  

Degrees of Freedom: 249 Total (i.e. Null);  245 Residual
Null Deviance:      346.2 
Residual Deviance: 341.4        AIC: 351.4

>colnames(df1) <- c("mat11", "mat12", "mat13", "mat14")

但是，当我运行预测时：

>preds <- predict(model1, df1, type="response")
Warning message:
'newdata' had 500 rows but variables found have 250 rows 
> str(preds)
 Named num [1:250] 0.562 0.436 0.41 0.595 0.384 ...
 - attr(*, "names")= chr [1:250] "1" "2" "3" "4" ...

这是不正确的预测用途吗？我需要能够使用它来对比训练集更大的测试数据集进行预测。

predict（）输出受模型训练数据集大小的限制，为什么？

0 个答案: