我正在使用for循环来生成100个不同的训练和测试集。
我现在要做的是保存这100个不同的训练和测试集,以便能够查看例如迭代次数为17。
此代码显示了我的程序,其中包含for循环以及分为训练和测试集的内容:
result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")
for (g in 1:100 )
{
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
REST OF MY CODE
# Calculate some statistics
overall <- cm$overall
overall.accuracy <- format(overall['Accuracy'] * 100, nsmall =2, digits = 2)
overall.sensitivity <- format(cm$byClass['Sensitivity']* 100, nsmall =2, digits = 2)
overall.specificity <- format(cm$byClass['Specificity']* 100, nsmall =2, digits = 2)
result_df[g,1] <- overall.accuracy
result_df[g,2] <- overall.sensitivity
result_df[g,3] <- overall.specificity
}
我该怎么做?
答案 0 :(得分:1)
您可以使用以下方法将其保存在csv文件中
write.csv(train, file = paste0("train-", Sys.time(), ".csv", sep=""))
write.csv(test, file = paste0("test-", Sys.time(), ".csv", sep=""))
答案 1 :(得分:1)
您可以执行以下操作,例如,将每个测试和训练集保存为列表中的元素:
result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")
testlist <- list()
trainlist <- list()
for (g in 1:100 )
{
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
trainlist[[g]] <- train
testlist[[g]] <- test
}
编辑
要检索这些列表的第7个元素,可以使用trainlist[[7]]
答案 2 :(得分:1)
将代码放入函数中,然后执行lapply()
:
result_df <- matrix(ncol=3, nrow=100)
colnames(result_df)<-c("Acc", "Sens", "Spec")
SIMg <- function(g) {
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
REST OF THE CODE
return(list(train=train, test=test, ...))
}
L <- lapply(1:100, SIMg)
结果列表L
有100个元素,每个元素都是一个列表,其中包含两个数据框以及一次模拟运行的结果。
要获取单独的列表trainlist
和testlist
,您可以执行以下操作:
trainlist <- lallpy(L, '[[', "train")
testlist <- lallpy(L, '[[', "test")
答案 3 :(得分:1)
一种选择可能是保存分区的行索引,而不是保存所有数据集,然后为感兴趣的迭代选择行索引。
插入符号包具有一个名为createDataPartition的函数,它将为您完成此操作:
library(caret)
df <- data.frame(col1 = rnorm(100), col2 = rnorm(100))
# create 100 partitions
train.idxs <- createDataPartition(1:nrow(df), times = 100, p = 0.8)
for(i in 1:length(train.idxs)) {
# create train and test sets
idx <- train.idxs[[i]]
train.df <- df[idx, ]
test.df <- df[-idx, ]
# calculate statistics ...
result_df[i,1] <- overall.accuracy
result_df[i,2] <- overall.sensitivity
result_df[i,3] <- overall.specificity
}
# check the datasets for the nth partition
# train set
df[train.idxs[[n]], ]
# test set
df[-train.idxs[[n]], ]