我正在尝试使用预测函数来预测逻辑回归的值,并且我得到的行数不正确。这个问题已经被问到了 R Warning: newdata' had 15 rows but variables found have 22 rows
我试过这个方法,但我仍然得到错误。这是代码
# Split as training and test sets
train_idx <- trainTestSplit(adult,trainPercent=75,seed=1111)
train <- adult[train_idx, ]
test <- adult[-train_idx, ]
xtrain <- train[,1:7]
ytrain <- train[,8]
xtrain1 <- dummy.data.frame(xtrain, sep = ".")
xtrain2 <- as.matrix(xtrain1)
xtest <- test[,1:7]
ytest <- test[,8]
xtest1 <- dummy.data.frame(xtest, sep = ".")
xtest2 <- as.matrix(xtest1)
fit=glm(ytrain~xtrain2,family=binomial)
a=predict(fit,newdata=xtrain1,type="response")
b=ifelse(a>0.5,1,0)
confusionMatrix(b,ytrain)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 16065 3157
1 968 2430
Accuracy : 0.8176
95% CI : (0.8125, 0.8227)
# Predict with test dataframe
a=predict(fit,xtest1,type="response")
: 'newdata' had 7541 rows but variables found have 22620 rows
2: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading
>
我也试过
names(xtest1)=names(xtrain1) and
a=predict(fit,xtest1,type="response")
无论如何它们都是一样的,但我得到了同样的错误。这是一个非常直观的问题。请帮忙......
答案 0 :(得分:0)
我更改了适合使用&#39;数据&#39;而不是矩阵和y列,现在它可以工作
adult1 <- dummy.data.frame(adult, sep = ".")
train_idx <- trainTestSplit(adult1,trainPercent=75,seed=1111)
train <- adult1[train_idx, ]
test <- adult1[-train_idx, ]
fit=glm(salary~.,family=binomial,data=train)
a=predict(fit,newdata=train,type="response")
b=ifelse(a>0.5,1,0)
confusionMatrix(b,train$salary)
m=predict(fit,newdata=test,type="response")