在我的数据中,最后一列显示了病变状态(1)或 没有患病(0),目标是将测试样本分组 虽然预测是“0.2189325”,但患病(1)或自由(0) “0.1674805”而不是0或1。
sample.train.data <- structure(list(V1 = c(0.0504799681418526, 0.0674893975400467),
V2 = c(0.375190991689635, 2.62836587379837e-07), V3 = c(0,
0), V4 = c(0, 0), V5 = c(0, 0.123349117705797), V6 = c(0,
0), V7 = c(0.0575526864592394, 4.0318003466356e-08), V8 = c(0,
0), V9 = c(0, 0.0819121309767076), V10 = c(0.0837245737400836,
5.8652477615664e-08), V11 = c(0, 0), V12 = c(0, 0), V13 = c(0,
0), V14 = c(0, 0), V15 = c(0, 0), V16 = c(0, 0), V17 = c(0,
0), V18 = c(0.0115973088249164, 8.12438769013043e-09), V19 = c(0,
0), V20 = c(0, 0), V21 = c(0, 0.0642970332370127), V22 = c(0,
0), V23 = c(0, 0), V24 = c(0, 0), V25 = c(0, 0), V26 = c(0,
0), V27 = c(0, 0), V28 = c(0, 0), V29 = c(0, 0), V30 = c(0,
0), V31 = c(0, 0.100087661334886), V32 = c(0, 0), V33 = c(0,
0), V34 = c(0.132277333556899, 9.2665665514059e-08), V35 = c(0.00157299602821123,
1.1019478536923e-09), V36 = c(0.121318235645494, 0.162196905737495
), V37 = c(0, 0), V38 = c(0.0661915890298985, 0.088495112621564
), V39 = c(0.10009431688377, 0.133821501722926), V40 = c(0,
0.039928021903824), V41 = c(0, 0), V42 = c(0, 0), V43 = c(0,
0), V44 = c(0, 0), V45 = c(0, 0.105729116180691), V46 = c(0,
0), V47 = c(0, 0), V48 = c(0, 0), V49 = c(0, 0), V50 = c(0,
0.0230295773750142), V51 = c(0, 0.00966395996496688), V52 = c(0,
0), V53 = c(0, 0), V54 = c(0, 1)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
"V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30",
"V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39",
"V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48",
"V49", "V50", "V51", "V52", "V53", "V54"), row.names = 1:2, class = "data.frame")
sample.test.data <- structure(list(V1 = c(0, 0.0502553931936882), V2 = c(0.32474835570625,
0.373521844489033), V3 = c(0, 0), V4 = c(0, 0), V5 = c(0.0798572088141946,
0.09185084822725), V6 = c(0, 0), V7 = c(0, 0), V8 = c(0.0913439079721602,
4.76496954607063e-08), V9 = c(0, 0), V10 = c(0.0724682048784116,
0.0833521004105655), V11 = c(0, 0), V12 = c(0, 0.00380492674778399
), V13 = c(0, 0), V14 = c(0.0300930020345612, 1.56980625668248e-08
), V15 = c(0.022461356489053, 1.17170024810405e-08), V16 = c(0.037002165179523,
0.0425594671846318), V17 = c(0, 0), V18 = c(0.0100381060711198,
5.23639406184491e-09), V19 = c(0, 0), V20 = c(0, 0), V21 = c(0,
0), V22 = c(0, 0), V23 = c(0, 0), V24 = c(0, 0), V25 = c(0, 0.0150866858339266
), V26 = c(0, 0.0282083101023333), V27 = c(0, 0), V28 = c(0,
0), V29 = c(0, 0), V30 = c(0, 0), V31 = c(0, 0.0745294069522065
), V32 = c(0, 0), V33 = c(0, 0), V34 = c(0.114493278147107, 0.131688859030858
), V35 = c(0, 0), V36 = c(0.105007578581866, 5.47773710537665e-08
), V37 = c(0, 0), V38 = c(0, 0), V39 = c(0.0866371142792093,
0.0996490179492987), V40 = c(0.0258497218465435, 1.34845486806539e-08
), V41 = c(0, 0), V42 = c(0, 0), V43 = c(0, 0), V44 = c(0, 0.00549299131535034
), V45 = c(0, 0), V46 = c(0, 0), V47 = c(0, 0), V48 = c(0, 0),
V49 = c(0, 0), V50 = c(0, 0), V51 = c(0, 0), V52 = c(0, 0
), V53 = c(0, 0), V54 = c(0, 0)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
"V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30",
"V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39",
"V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48",
"V49", "V50", "V51", "V52", "V53", "V54"), row.names = 81:82, class = "data.frame")
disease.col <- paste("V", ncol(sample.train.data), sep= '')
f <- paste(disease.col, " ~ . ", sep="")
svm.model <- svm(as.formula(f), data=sample.train.data, cost=100, gamma=1)
svm.pred <- predict(svm.model, sample.test.data[, -ncol(sample.test.data)])
comp.table <- table(pred=svm.pred, true = sample.test.data[, ncol(sample.test.data)])
print(comp.table)
输出:
true
pred 0
0.16748052821151 1
0.21893247843041 1
正如您所看到的,预测输出为0.167和0.218,而样本可以归类为0或1,这也是svm的列车数据也被分类的方式。
注意:我在这里复制了样本,实际训练数据有80个样本,测试数据有20个样本。这只是训练和测试数据的样本,每个样本有两个样本。此外,实际数据不会生成用于创建svm.model的警告消息。
我曾尝试过不同的成本价值或者svm模型的伽马值,不同的数据组合,即使测试数据具有样本状态(0,1),我仍然得到类似的结果。如果有人能让我知道我做错了什么,我会非常感激。
答案 0 :(得分:2)
您的响应变量应该是触发分类行为的一个因素。在您的示例中
sample.train.data$V54<-factor(sample.train.data$V54)
这会将V54从数字转换为因子。然后你可以用完全相同的方式运行代码。