分割图像数据集:培训,验证和测试

时间:2019-06-15 19:15:27

标签: r validation

我有一些代码R代码,其中我将图像数据分为针对机器学习分类问题的训练和验证集。那很好,但是现在我需要添加一个最终测试集。现在我遇到了一个错误,但不确定是什么错误。

这是我尝试和工作的代码:

image_subset <-
    cbind(file_paths = image_names_subset$file_paths, y_subset) %>%
    mutate(file_paths = as.character(file_paths))

  y_cols <- colnames(y_subset)

  subset_output_classes <- y_cols

  train_val_split <- 0.70

  train_image_rows <-
    sort(sample(nrow(image_subset), train_val_split * nrow(image_subset)))

  val_image_rows <-
    which(!(seq(1, nrow(image_subset)) %in% train_image_rows))

  image_subset_train <-
    image_subset[train_image_rows, ]

  image_subset_val <-
    image_subset[-train_image_rows, ]
 image_subset <-
    cbind(file_paths = image_names_subset$file_paths, y_subset) %>%
    mutate(file_paths = as.character(file_paths))

  y_cols <- colnames(y_subset)

  subset_output_classes <- y_cols

  train_val_split <- 0.60


#Added, want to be able to split the remaining 40% of data in 1/2 for validation and test sets 

val_test_split <- 0.50 

  train_image_rows <-
    sort(sample(nrow(image_subset), train_val_split * nrow(image_subset)))

# Added
  val_image_rows <-
    which(!(seq(1, nrow(image_subset)) %in% train_image_rows))

# Error occurs here when I run this command
  test_image_rows <- sample(nrow(val_image_rows), val_test_split * nrow(val_image_rows))

  val_image_rows2 <- which(!(seq(1, nrow(val_image_rows)) %in% test_image_rows))
  • 期望从image_subset的行的60%填充train_image_rows。这行得通

  • 我希望其余40%的行填充val_image_rows。这也起作用

  • 尝试拆分测试集以填充test_image_rows时出现错误:

Error in sample.int(length(x), size, replace, prob) : 
invalid 'size' argument

1 个答案:

答案 0 :(得分:0)

找出问题所在: -val-image_rows是行号的向量,而不是像image_subset这样的数据帧 -使用以下代码:

 train_image_rows <-
    sort(sample(nrow(image_subset), train_val_split * nrow(image_subset)))

# All rows for validation and test
val_image_rows <-
    which(!(seq(1, nrow(image_subset)) %in% train_image_rows))

# Validation set rows only
val_image_rows2 <- sample(val_image_rows, val_test_split * length(val_image_rows))

# Test set rows only
test_image_rows <- val_image_rows[which(!(val_image_rows %in% val_image_rows2))]