createDataPartition不分区数据

时间:2019-10-18 03:15:40

标签: r partitioning training-data test-data

我正在尝试使用R中的createDataPartition将数据帧分为训练集和测试集,训练集包含60%的数据。当我运行此代码并查看生成的对象时,SF.training_2具有所有观察结果,而SF_test.2没有任何观察结果。救命?我还收到一条错误消息,即使我在代码的其他位置成功运行了Summary命令,也无法识别摘要命令,这令我感到困惑/担忧。

inTrain <- createDataPartition(
  y = paste(data_train_test$Rooms, 
            data_train_test$crime_nn5, 
            data_train_test$nhood, 
            data_train_test$BLDGSQFT, 
            data_train_test$estimate),
  p = .60, 
  list = FALSE)

SF.training_2 <- data_train_test[inTrain,]

summmary(SF.training_2)

SF.test_2 <- data_train_test[-inTrain,]

1 个答案:

答案 0 :(得分:0)

似乎您使用了插入符号和Tidyverse库。为了帮助您,我们需要一些数据示例。让我们创建一个虚拟数据集:

library(caret)
library(tidyverse)
data_train_test <- data.frame(Rooms c("a","b","c","a","b","c","a","b","c","a"),
                          crime_nn5 = c(2,3,4,2,3,2,3,2,3,4), nhood = c("Alvem","Rhye","Huttons","Rhye","Olan","Alvem","Olan","Huttons","Alvem","Rhye"),
                          BLDGSQFT = c(400,600,660,480,590,480,510,500,700,570),
                          estimate = c(34000, 55000, 60000, 37000, 50000, 45000, 48000, 51000, 80000, 52000))

现在您要进行数据分区。如您在文档(https://cran.r-project.org/web/packages/caret/caret.pdf中所读)中,“ y”必须是结果的向量,但在您的代码中则不是。顺便说一句,您说的摘要功能给您的错误消息有错字,它被写为“摘要”。

inTrain <- createDataPartition(data_train_test$Rooms, times = 1, p = 0.6, list = FALSE)

SF.training_2 <- data_train_test[inTrain,]

summary(SF.training_2)

SF.test_2 <- data_train_test[-inTrain,]

此代码应为您工作。请不要忘记提供一个最小的可重复数据示例,这样我们可以为您提供更好的帮助。

此致

亚历克西斯