我想使用10倍交叉验证来应用CFS来选择数据集中的重要功能。虽然我的原始数据集ka包含71个独立变量和一个具有2个类级别的目标变量。此外,我选择了svm模型来测试所选功能的准确性。我得到了这个错误eval中的错误(predvars,data,env): 数字'envir'arg不是长度为1 换行时出错:无法打开连接
73.0.3683.103
答案 0 :(得分:1)
在过滤方法中,我们不需要使用交叉验证,因为它独立于clssifier并且不会导致任何变异
library(caret)
library(e1071)
library(FSelector)
#split data into train and test
trainIndex <- createDataPartition(data$Cardio1M, p=0.7, list=FALSE)
data_train <- data[ trainIndex,]
data_test <- data[-trainIndex,]
#final selected subset
finalSubset<-as.character()
set.seed(10)
#relevant and unredundant features selected
# train and test your model with data.train and data.test
subset <- cfs(Cardio1M~.,data_train)
#then we have to fit the model with the new subset
subset<-list(subset)
train<-data_train[,subset]
#Using selected features to train svm
svm_model<-svm(Cardio1M~.,train,cost=.1,kernel="radial")
#tuning svm model hyperparameters
#tuned<-tune(svm_model,Cardio1M,train,ranges=list(cost=c(0.001,0.01,.1,1,100)))
#Predict test set
p<-predict(data.validation[,-72],bestmodel)
#accuracy of the model
accuracy=mean(p==data_test[,72])