我有一些代码R代码,其中我将图像数据分为针对机器学习分类问题的训练和验证集。那很好,但是现在我需要添加一个最终测试集。现在我遇到了一个错误,但不确定是什么错误。
这是我尝试和工作的代码:
image_subset <-
cbind(file_paths = image_names_subset$file_paths, y_subset) %>%
mutate(file_paths = as.character(file_paths))
y_cols <- colnames(y_subset)
subset_output_classes <- y_cols
train_val_split <- 0.70
train_image_rows <-
sort(sample(nrow(image_subset), train_val_split * nrow(image_subset)))
val_image_rows <-
which(!(seq(1, nrow(image_subset)) %in% train_image_rows))
image_subset_train <-
image_subset[train_image_rows, ]
image_subset_val <-
image_subset[-train_image_rows, ]
image_subset <-
cbind(file_paths = image_names_subset$file_paths, y_subset) %>%
mutate(file_paths = as.character(file_paths))
y_cols <- colnames(y_subset)
subset_output_classes <- y_cols
train_val_split <- 0.60
#Added, want to be able to split the remaining 40% of data in 1/2 for validation and test sets
val_test_split <- 0.50
train_image_rows <-
sort(sample(nrow(image_subset), train_val_split * nrow(image_subset)))
# Added
val_image_rows <-
which(!(seq(1, nrow(image_subset)) %in% train_image_rows))
# Error occurs here when I run this command
test_image_rows <- sample(nrow(val_image_rows), val_test_split * nrow(val_image_rows))
val_image_rows2 <- which(!(seq(1, nrow(val_image_rows)) %in% test_image_rows))
期望从image_subset的行的60%填充train_image_rows。这行得通
我希望其余40%的行填充val_image_rows。这也起作用
尝试拆分测试集以填充test_image_rows时出现错误:
Error in sample.int(length(x), size, replace, prob) :
invalid 'size' argument
答案 0 :(得分:0)
找出问题所在: -val-image_rows是行号的向量,而不是像image_subset这样的数据帧 -使用以下代码:
train_image_rows <-
sort(sample(nrow(image_subset), train_val_split * nrow(image_subset)))
# All rows for validation and test
val_image_rows <-
which(!(seq(1, nrow(image_subset)) %in% train_image_rows))
# Validation set rows only
val_image_rows2 <- sample(val_image_rows, val_test_split * length(val_image_rows))
# Test set rows only
test_image_rows <- val_image_rows[which(!(val_image_rows %in% val_image_rows2))]