我有使用PimaIndianDiabetes工作的最低工作环境。
#load required library
library(mlbench)
#load Pima Indian Diabetes dataset
data(PimaIndiansDiabetes)
#set seed to ensure reproducible results
set.seed(42)
#split into training and test sets
PimaIndiansDiabetes[,train] <- ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
#separate training and test sets
trainset <- PimaIndiansDiabetes[PimaIndiansDiabetes$train==1,]
testset <- PimaIndiansDiabetes[PimaIndiansDiabetes$train==0,]
#get column index of train flag
trainColNum <- grep(“train”,names(trainset))
#remove train flag column from train and test sets
trainset <- trainset[,-trainColNum]
testset <- testset[,-trainColNum]
#get column index of predicted variable in dataset
typeColNum <- grep(“diabetes”,names(PimaIndiansDiabetes))
我的当前问题是使用IFELSE函数将数据拆分为训练和测试集,并使用R代码中指定的概率。
答案 0 :(得分:1)
中有错误
PimaIndiansDiabetes[,train] <- ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
ifelse工作正常:
ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
但你必须使用一个字符串来指定一个新列('train'而不是train)
PimaIndiansDiabetes[,'train'] <- ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
接下来没有用的是选择'trainColNum',你可以这样做
trainColNum <- which(colnames(PimaIndiansDiabetes) == 'train')
或者您使用dplyr包删除列
library(dplyr)
trainset <- trainset %>% select(-train)
testset <- testset %>% select(-train)
糖尿病专栏相同
typeColNum <- which(colnames(PimaIndiansDiabetes) == 'diabetes')