我正在试图弄清楚如何正确使用predict()。目前,当我使用predict()时,我收到消息“'newdata'有500行,但找到的变量有250行”。我写这个例子是为了强调我缺乏理解:
>mat1 <- matrix(rbinom(1000, 1, 0.5), ncol = 4)
>seq1 <- rbinom(250, 1, 0.5)
>model1 <- glm(seq1 ~ mat1, family=binomial)
>mat2 <- matrix(rbinom(2000, 1, 0.5), ncol = 4)
>df1 <- data.frame(mat2)
此时,我知道数据帧(df1)需要与模型中找到的标签相同的标签,所以我这样做:
>model1
Call: glm(formula = seq1 ~ mat1, family = binomial)
Coefficients:
(Intercept) mat11 mat12 mat13 mat14
-0.36483 0.21621 0.50607 -0.10879 0.02709
Degrees of Freedom: 249 Total (i.e. Null); 245 Residual
Null Deviance: 346.2
Residual Deviance: 341.4 AIC: 351.4
>colnames(df1) <- c("mat11", "mat12", "mat13", "mat14")
但是,当我运行预测时:
>preds <- predict(model1, df1, type="response")
Warning message:
'newdata' had 500 rows but variables found have 250 rows
> str(preds)
Named num [1:250] 0.562 0.436 0.41 0.595 0.384 ...
- attr(*, "names")= chr [1:250] "1" "2" "3" "4" ...
这是不正确的预测用途吗?我需要能够使用它来对比训练集更大的测试数据集进行预测。