在train()中使用preProcess时,预测精度较低

时间:2019-02-10 13:18:12

标签: r machine-learning r-caret

R和机器学习的新手,所以请原谅这个基本问题...

我正在试验kernlab库中的“垃圾邮件”数据集。并使用插入符号库中的函数。

目标:

预测“垃圾邮件”中58个剩余变量的“类型”

我尝试了两种不同的预处理方式:

train()之前的预处理数据集

# preprocess all 57 predictors, leave out response variable #58
preproc = preProcess(trainset[-58], method = "BoxCox")

preprocTrain = predict(preproc, trainset[,-58])
preprocTrain$type = trainset$type

preprocTest = predict(preproc, testset[,-58])
preprocTest$type = testset$type

set.seed(123)
fit2 = train(type~., data=preprocTrain, method = "glm")
predict2 = predict(fit2, newdata = preprocTest)
confmat2 = confusionMatrix(predict2, preprocTest$type)

fit2$results
confmat2$overall

注意:

fit2 Accuracy = 0.93 and confmat2 Accuracy = 0.92

然后,  在preProcess内使用train()

set.seed(123)
fit3 = train(type~., data=trainset, method="glm", preProcess = "BoxCox")

Predict using pre-processed test set from before

predict3 = predict(fit3, newdata = preprocTest)
confmat3 = confusionMatrix(predict3, preprocTest$type)

fit3$results
confmat3$overall

现在, fit3 Accuracy = 0.93confmat3 Accuracy = 0.75

请帮助我理解为什么这种急剧下降? confmat3精度不应该与confmat2精度相同吗?区别在哪里?另外,在第二个预测中,我得到以下警告:

Warning messages:
1: In predict.BoxCoxTrans(object$bc[[i]], newdata[, i]) :
  newdata should have values 0
2: In predict.BoxCoxTrans(object$bc[[i]], newdata[, i]) :
  newdata should have values 0

谢谢!

0 个答案:

没有答案