我通过svm
包对e1071
模型进行了分类。数据如下所示:
dtm[140:145] %>% str()
Observations: 387
'data.frame': 387 obs. of 4 variables:
$ comes: num 0 0 0 0 0 0 0 0 0 0 ...
$ able : num 0 0 0 0 0 0 0 0 0 0 ...
$ hours: num 0 0 0 0 0 0 0 0 0 0 ...
$ type : Factor w/ 4 levels "-1","0","1","9": 3 3 4 4 4 3 3 3 3 4 ...
目标是通过type
中的所有其他数据预测dtm
。问题是结果非常糟糕(准确度~0.6)。这是因为我得到的样本不平衡:
prop.table(table(dtm$type))
-1 0 1 9
0.025839793 0.005167959 0.180878553 0.788113695
所以,我尝试tune
基于this approach的模型。
没有任何调整,结果如下所示:
x <- subset(dtm, select=-type)
y <- dtm$type
classifier <- svm(type~.,data = dtm)
pred<- predict(classifier,x)
confusionMatrix(pred, y)
Reference
Prediction -1 0 1 9
-1 0 0 0 0
0 0 2 0 0
1 1 0 54 1
9 9 0 16 304
通过调整,结果如下(我还根据class.weights
实施了prop.table(table(dtm$type)
,smv_tune
表示最佳gamma
为0.5且最佳{{1}将是1)。
cost
如您所见,svm_tune <- tune(svm, train.x=x, train.y=y,
class.weights = c("-1" = 9.439025,
"0" = 22.76471,
"1" = 2.866667,
"9" = 1.994845),
ranges=list(cost=10^(-1:2),
gamma=c(.5,1,2)))
svm_model_after_tune <- svm(type ~ ., data=dtm, cost=1, gamma=0.5,
class.weights = c("-1" = 9.439025,
"0" = 22.76471,
"1" = 2.866667,
"9" = 1.994845))
pred_tuned <- predict(svm_model_after_tune,x)
confusionMatrix(pred_tuned,y)
Reference
Prediction -1 0 1 9
-1 10 0 0 0
0 0 2 0 0
1 0 0 70 0
9 0 0 0 305
效果很好。但是当我尝试在列车测试情况下实施该模型时。结果并没有比以前好多少。实际上,情况更糟:
svm_model_after_tune
问题是如何以可以从中受益的方式实施train_index <- createDataPartition(dtm$type, p=0.75, list=FALSE)
train <- dtm[train_index,]
test <- dtm[-train_index,]
classifier <- svm(type~.,
data = train,
kernel = "radial",
cross = 10,
cost = 1,
gamma = 0.5,
class.weights = c("-1" = 9.439025,
"0" = 22.76471,
"1" = 2.866667,
"9" = 1.994845))
pred <- predict(classifier, newdata=test)
confusionMatrix(pred, test$type)
Reference
Prediction -1 0 1 9
-1 0 0 0 0
0 0 0 0 0
1 0 0 0 0
9 10 4 33 48
模型?