所以我有以下代码,其中我使用默认参数执行SVM,后来我使用10倍CV进行参数调音
library(readr)
library("e1071")
wdbc <- read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data",
col_names = FALSE, col_types = cols(X1 = col_skip(),
X2 = col_factor(levels = c("M", "B"))))
smp_size <- floor(0.75 * nrow(wdbc))
set.seed(2)
train_ind <- sample(seq_len(nrow(wdbc)), size = smp_size)
train <- wdbc[train_ind, ]
test <- wdbc[-train_ind, ]
model <- svm(X2~., data=train, kernel="radial", probability = TRUE)
predicted <- predict(model, test[,-1], probability = TRUE)
CM <- table(test$X2, predicted)
print(CM)
svm_tune <- tune(svm, train[,-1], train.y=train$X2,
kernel="radial", ranges=list(cost=10^(-1:2), gamma=c(.5,1,2)))
summary(svm_tune)
model_after_tune <- svm(X2~., data=train, kernel="radial", probability = TRUE, gamma = 0.5, cost = 10)
predicted <- predict(model_after_tune, test[,-1], probability = TRUE)
#attr(predicted, "probabilities")
CM <- table(test$X2, predicted)
print(CM)
当我为默认的svm预测打印他的混淆矩阵时,我得到3个未分类的点。调整参数时,我得到了15。 我尝试使用其他种子,以查看是否可以获得更好的结果,但是默认参数似乎总是可以更好地工作。 任何线索为什么会这样? 谢谢