我试图运行来自" Applied Predictive Modeling"本书,关于通过插入符号训练SVM与径向内核的部分" train"功能
我没有添加任何内容就复制了代码。代码运行没有任何错误,但结果与书中写的不一致。所有概率几乎相同,所有对象都分为一类。这是代码:
library(caret)
data("GermanCredit")
GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
# remove some other columns that do not add useful information
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL
#Split the data into training (80%) and test sets (20%)
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest <- GermanCredit[-inTrain, ]
set.seed(1056)
svmFit <- train(Class ~ .,
data = GermanCreditTrain,
method = "svmRadial",
preProcess = c("center", "scale"),
tuneLength = 10,
trControl = trainControl(method = "repeatedcv", repeats = 5,
classProbs = TRUE))
模型的输出如下:
> svmFit
Support Vector Machines with Radial Basis Function Kernel
800 samples
41 predictor
2 classes: 'Bad', 'Good'
Pre-processing: centered (41), scaled (41)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ...
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.70025 0.006361713
0.50 0.70025 0.006372290
1.00 0.70025 0.006372290
2.00 0.70075 0.008001058
4.00 0.70100 0.009101928
8.00 0.69950 0.004902168
16.00 0.70050 0.006864093
32.00 0.70025 0.006361713
64.00 0.70050 0.007509254
128.00 0.70050 0.007472237
Tuning parameter 'sigma' was held constant at a value of 0.01390712
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01390712 and C = 4.
因此,准确性甚至不会改变。我尝试了不同的参数集,但结果是一样的。
所有样本的概率几乎相同:约为0.304&#34;差&#34;上课,~0.695为&#34;好&#34; (差异仅在第四位)。
本书的结果可在此处获取:https://github.com/cran/AppliedPredictiveModeling/blob/master/inst/chapters/04_Over_Fitting.Rout
他们有
> svmFit
Support Vector Machines with Radial Basis Function Kernel
800 samples
41 predictors
2 classes: 'Bad', 'Good'
Pre-processing: centered, scaled
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ...
Resampling results across tuning parameters:
C Accuracy Kappa Accuracy SD Kappa SD
0.25 0.744 0.362 0.0499 0.113
0.5 0.74 0.35 0.0516 0.117
1 0.746 0.348 0.0522 0.125
2 0.743 0.325 0.0467 0.116
4 0.744 0.322 0.0477 0.12
8 0.75 0.323 0.0464 0.13
16 0.745 0.302 0.0457 0.13
32 0.739 0.28 0.0451 0.126
64 0.743 0.284 0.0444 0.135
128 0.734 0.265 0.0445 0.124
Tuning parameter 'sigma' was held constant at a value of 0.008918477
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.00892 and C = 8.
此外,整个班级都得到了这样的结果,但是老师,他的电脑有较旧版本的R,得到了正确的结果。所以这是我的问题:R,插入符号,kernlab等新版本中的某些更改中的问题,还是我对其他内容做错了?如何更改此代码以获得正确的结果? Caret版本是6.0-77。
提前致谢。