如何保存随机生成的训练和测试数据集?

时间:2018-11-07 09:32:17

标签: r

我正在使用for循环来生成100个不同的训练和测试集。

我现在要做的是保存这100个不同的训练和测试集,以便能够查看例如迭代次数为17。

此代码显示了我的程序,其中包含for循环以及分为训练和测试集的内容:

result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")

for (g in 1:100 )
{

  # Divide into Train and test set
  smp_size <- floor(0.8 * nrow(mydata1))
  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
  train <- mydata1[train_ind, ]
  test <- mydata1[-train_ind, ]

  REST OF MY CODE




  # Calculate some statistics

  overall <- cm$overall
  overall.accuracy <- format(overall['Accuracy'] * 100, nsmall =2, digits = 2)
  overall.sensitivity <- format(cm$byClass['Sensitivity']* 100, nsmall =2, digits = 2)
  overall.specificity <- format(cm$byClass['Specificity']* 100, nsmall =2, digits = 2)

  result_df[g,1] <- overall.accuracy
  result_df[g,2] <- overall.sensitivity
  result_df[g,3] <- overall.specificity

}

我该怎么做?

4 个答案:

答案 0 :(得分:1)

您可以使用以下方法将其保存在csv文件中

write.csv(train, file = paste0("train-", Sys.time(), ".csv", sep=""))
write.csv(test, file = paste0("test-", Sys.time(), ".csv", sep=""))

答案 1 :(得分:1)

您可以执行以下操作,例如,将每个测试和训练集保存为列表中的元素:

result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")
testlist <- list()
trainlist <- list()
for (g in 1:100 )
{
  # Divide into Train and test set
  smp_size <- floor(0.8 * nrow(mydata1))
  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
  train <- mydata1[train_ind, ]
  test <- mydata1[-train_ind, ]
  trainlist[[g]] <- train
  testlist[[g]] <- test
  }

编辑 要检索这些列表的第7个元素,可以使用trainlist[[7]]

答案 2 :(得分:1)

将代码放入函数中,然后执行lapply()

result_df <- matrix(ncol=3, nrow=100)
colnames(result_df)<-c("Acc", "Sens", "Spec")

SIMg <- function(g) {

  # Divide into Train and test set
  smp_size <- floor(0.8 * nrow(mydata1))
  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
  train <- mydata1[train_ind, ]
  test <- mydata1[-train_ind, ]

  REST OF THE CODE

  return(list(train=train, test=test, ...))
}
L <- lapply(1:100, SIMg)

结果列表L有100个元素,每个元素都是一个列表,其中包含两个数据框以及一次模拟运行的结果。
要获取单独的列表trainlisttestlist,您可以执行以下操作:

trainlist <- lallpy(L, '[[', "train")
testlist  <- lallpy(L, '[[', "test")

答案 3 :(得分:1)

一种选择可能是保存分区的行索引,而不是保存所有数据集,然后为感兴趣的迭代选择行索引。

插入符号包具有一个名为createDataPartition的函数,它将为您完成此操作:

library(caret)

df <- data.frame(col1 = rnorm(100), col2 = rnorm(100))

# create 100 partitions
train.idxs <- createDataPartition(1:nrow(df), times = 100, p = 0.8)

for(i in 1:length(train.idxs)) {
# create train and test sets 
idx <- train.idxs[[i]]
train.df <- df[idx, ]
test.df <- df[-idx, ]

# calculate statistics ...

result_df[i,1] <- overall.accuracy
result_df[i,2] <- overall.sensitivity
result_df[i,3] <- overall.specificity
}

# check the datasets for the nth partition
# train set
df[train.idxs[[n]], ]

# test set
df[-train.idxs[[n]], ]