在R中调整SVM - 从属变量的类型错误

时间:2016-07-16 17:43:07

标签: r svm

我使用svm中的e1071来获取这样的数据集:

sdewey <- svm(x = as.matrix(trainS), 
              y = trainingSmall$DEWEY,
              type="C-classification")

这很好用,但是当我尝试像这样调整成本和伽玛时:

svm_tune <- tune(svm, train.x=as.matrix(trainS), train.y=trainingSmall$DEWEY, type="C-classification",    ranges=list(cost=10^(-1:6), gamma=1^(-1:1)))

我收到此错误:

  

调谐错误(svm,train.x = as.matrix(trainS),train.y =   trainingSmall $ DEWEY,:从属变量的类型错误!

我的训练数据的结构是这样的,但还有更多的线:

'data.frame':   1000 obs. of  1542 variables:
 $ women.prisoners                                  : int  1 0 0 0 0 0 0 0 0 0 ...
 $ reformatories.for.women                          : int  1 0 0 0 0 0 0 0 0 0 ...
 $ women                                            : int  1 0 0 0 0 0 0 0 0 0 ...
 $ criminal.justice                                 : int  1 0 0 0 0 0 0 0 0 0 ...
 $ soccer                                           : int  0 1 0 0 0 0 0 0 0 0 ...
 $ coal.mines.and.mining                            : int  0 0 1 0 0 0 0 0 0 0 ...
 $ coal                                             : int  0 0 1 0 0 0 0 0 0 0 ...
 $ engineering.geology                              : int  0 0 1 0 0 0 0 0 0 0 ...
 $ family.violence                                  : int  0 0 0 1 0 0 0 0 0 0 ...

这是一个多类问题。 我不确定如何解决这个问题,或者是否有其他方法可以找到成本和伽玛参数的最佳值。

Here is an example of my data,而trainS是没有前4列的数据(DEWEY,D1,D2和D3)

由于

1 个答案:

答案 0 :(得分:1)

require(e1071)

trainingSmall<-read.csv("trainingSmallExtra.csv")

sdewey <- svm(x      = as.matrix(trainingSmall[,4:nrow(trainingSmall)]), 
              y      = trainingSmall$DEWEY,
              type   = "C-classification",
              kernel = "linear" # same as no kernel
              )

这是有效的,因为svm已自动将DEWEY转换为系数。

tune模型失败,因为它是为用户自定义而设置的,它依赖于您提供正确的数据类型。由于DEWEY是整数而不是factor,因此失败了。我们可以解决这个问题:

trainingSmall$DEWEY <- as.factor(trainingSmall$DEWEY)

svm_tune <- tune(svm, train.x = as.matrix(trainingSmall[,4:nrow(trainingSmall)]), 
                      train.y = trainingSmall$DEWEY, # the way I'm formatting your  
                      kernel  = "linear",            # code is Google's R style
                      type    = "C-classification",    
                      ranges  = list(
                                      cost  = 10^(-1:6), 
                                      gamma =  1^(-1:1)
                                    )
                 )