使用FSelector包和交叉验证在r中进行相关滤波器选择

时间:2017-11-03 10:11:36

标签: r feature-selection

我想使用10倍交叉验证来应用CFS来选择数据集中的重要功能。虽然我的原始数据集ka包含71个独立变量和一个具有2个类级别的目标变量。此外,我选择了svm模型来测试所选功能的准确性。我得到了这个错误eval中的错误(predvars,data,env):   数字'envir'arg不是长度为1 换行时出错:无法打开连接

73.0.3683.103

1 个答案:

答案 0 :(得分:1)

在过滤方法中,我们不需要使用交叉验证,因为它独立于clssifier并且不会导致任何变异

library(caret)

    library(e1071)
    library(FSelector)
    #split data into train and test
    trainIndex <- createDataPartition(data$Cardio1M, p=0.7, list=FALSE)
    data_train <- data[ trainIndex,]
    data_test <- data[-trainIndex,]
    #final selected subset
    finalSubset<-as.character()
    set.seed(10)
    #relevant and unredundant features selected
       # train and test your model with data.train and data.test
    subset <- cfs(Cardio1M~.,data_train)
    #then we have to fit the model with the new subset
    subset<-list(subset)
    train<-data_train[,subset]
    #Using selected features to train svm
    svm_model<-svm(Cardio1M~.,train,cost=.1,kernel="radial")
    #tuning svm model hyperparameters
    #tuned<-tune(svm_model,Cardio1M,train,ranges=list(cost=c(0.001,0.01,.1,1,100)))
    #Predict test set
    p<-predict(data.validation[,-72],bestmodel)
    #accuracy of the model
    accuracy=mean(p==data_test[,72])