Question

    I have a dataset with 90 rows and 5 columns ,of which 4 independent variables and one is dependent variable .I need to split the dataset into test and train Leaving one out cross validation .For example 90th train ,rest all test ....89th train ..rest all train and so on


  Below is the code which I tried ,its not working

K = 90 折叠<-rep_len（1：nrFolds，nrow（data））

# actual cross validation
for(k in 1:nrFolds) {
  # actual split of the data
  print(k)
  fold <- which(folds == k)
  data.train <- data[-fold,]
  dim(data.train)
  data.test <- data[fold,]
  dim(data.test)

}

任何帮助将不胜感激。此后，我需要将此测试发送给数据分类器，并将其训练到分类器中进行训练和测试。谢谢

Answer 1

如果我对您的理解正确：（我使用了mtcars数据集，因为您未随问题提供数据）

$(document).on("click","#keyword_1",function(){ 

//do something.
 });

这将生成以下列表：

res <- lapply(1: (nrow(mtcars)-1), function(n){
  train_idx <- sample(1:nrow(mtcars), n)
  list(train = mtcars[train_idx,], test = mtcars[-train_idx,])
})

其中每个项目都包含所请求的火车和测试df。正如其他人指出的那样，每次运行时，这都会产生不同的观察结果组合。（也许是str(res, max.level = 2) List of 31 $ :List of 2 ..$ train:'data.frame': 1 obs. of 11 variables: ..$ test :'data.frame': 31 obs. of 11 variables: $ :List of 2 ..$ train:'data.frame': 2 obs. of 11 variables: ..$ test :'data.frame': 30 obs. of 11 variables: ... $ :List of 2 ..$ train:'data.frame': 30 obs. of 11 variables: ..$ test :'data.frame': 2 obs. of 11 variables: $ :List of 2 ..$ train:'data.frame': 31 obs. of 11 variables: ..$ test :'data.frame': 1 obs. of 11 variables:之前？）。我以前也没有见过这种拆分。

Answer 2

以下代码将随机选择的数据的70％分为训练集，其余30％的样本分为测试数据集。

data<-read.csv("c:/datafile.csv")

dt = sort(sample(nrow(data), nrow(data)*.7))
train<-data[dt,]
test<-data[-dt,]

这是另一个很好，非常好并且非常通用的示例。

library(ISLR)
attach(Smarket)
smp_siz = floor(0.75*nrow(Smarket))  # creates a value for dividing the data into train and test. In this case the value is defined as 75% of the number of rows in the dataset
smp_siz  # shows the value of the sample size

set.seed(123)   # set seed to ensure you always have same random numbers generated
train_ind = sample(seq_len(nrow(Smarket)),size = smp_siz)  # Randomly identifies therows equal to sample size ( defined in previous instruction) from  all the rows of Smarket dataset and stores the row number in train_ind
train =Smarket[train_ind,] #creates the training dataset with row numbers stored in train_ind
test=Smarket[-train_ind,]  # creates the test dataset excluding the row numbers mentioned in train_ind

require(caTools)  # loading caTools library

## Loading required package: caTools

set.seed(123)   #  set seed to ensure you always have same random numbers generated
sample = sample.split(Smarket,SplitRatio = 0.75) # splits the data in the ratio mentioned in SplitRatio. After splitting marks these rows as logical TRUE and the the remaining are marked as logical FALSE
train1 =subset(Smarket,sample ==TRUE) # creates a training dataset named train1 with rows which are marked as TRUE
test1=subset(Smarket, sample==FALSE)

https://rpubs.com/ID_Tech/S1

另外，请参见此。

https://edumine.wordpress.com/2015/04/06/splitting-a-data-frame-into-training-and-testing-sets-in-r/

如何在R中将数据集划分为测试和训练的所有可能组合？

2 个答案: