我的目标是创建一个函数,我可以根据给定的百分比简单地将数据集拆分为两个(训练和测试数据集),但是将这个百分比保持在定义的组中。对不起我的英语不好,这里有澄清它的功能:
split.g <- function (df, group, pc = 0.75) {
group <- as.factor(df$group)
list.df.g <- list()
list.df.g.train <- list()
list.df.g.test <- list()
for (i in 1 : length(levels(group))) {
list.df.g[[i]] <- subset(df, group == levels(group)[i])
list.df.g.train[[i]] <- list.df.g[[i]][sample(nrow(list.df.g[[i]]), round((nrow(list.df.g[[i]])*pc), 0), replace = F), ]
list.df.g.test[[i]] <- list.df.g[[i]][-(which(rownames(list.df.g[[i]]) %in% rownames(list.df.g.train[[i]]))), ]
}
list(do.call("rbind", list.df.g.train), do.call("rbind", list.df.g.test))
}
当我使用我的数据帧运行此函数时,我收到以下错误:
Error in list.df.g[[i]] <- subset(df, group == levels(group)[i]) :
attempt to select less than one element
但是,如果功能代码略有变化,则效果很好:
split.g <- function (df, group, pc = 0.75) {
group <- as.factor(df[, which(colnames(df) == group)])
list.df.g <- list()
list.df.g.train <- list()
list.df.g.test <- list()
for (i in 1 : length(levels(group))) {
list.df.g[[i]] <- subset(df, group == levels(group)[i])
list.df.g.train[[i]] <- list.df.g[[i]][sample(nrow(list.df.g[[i]]), round((nrow(list.df.g[[i]])*pc), 0), replace = F), ]
list.df.g.test[[i]] <- list.df.g[[i]][-(which(rownames(list.df.g[[i]]) %in% rownames(list.df.g.train[[i]]))), ]
}
list(do.call("rbind", list.df.g.train), do.call("rbind", list.df.g.test))
}
更改位于第二行。通过使用$,该功能不起作用,我不明白为什么?有人回答了吗?