Question

我正在尝试拆分数据，但出现此错误

 > training <- subset(data, split == "TRUE")
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 768 but subscript `r` has size 9.
Run `rlang::last_error()` to see where the error occurred.
>

这是我试过的代码

  split <- sample.split(data, SplitRatio = 0.7)
    split
    training <- subset(data, split == "TRUE")

Answer 1

我不确定代码的目的是什么 - 我假设您正在尝试拆分 70% 的训练数据集来处理？既然您已经标记了“tidyverse”，我假设您很乐意使用 tidyverse 原则来实现相同的目标。

考虑下面的代码：

library(tidyverse)

dat <- data.frame(
        ID = c(1:1000)
        ,X = c(rnorm(1000, 0, 1))
) %>% mutate(
        y = X + c(rnorm(1000, 0, 0.05))
)

set.seed(100)

train <- sample_frac(dat, size = 0.7)
test <- dat %>% 
        anti_join(
                train, by = "ID"
        )

我已经加载了 tidyverse 库，然后创建了一个数据框，其中包含我的所有读数（带有关联的 ID）。我已将此数据框保存为“dat”。我还“set.seed”以确保我的分割是可重复的。

从那里开始，您只需使用“sample_frac”函数获取数据集样本，然后使用“anti_join”函数收集剩余元素作为测试集。

希望这有帮助吗？

编辑：如果你把那个确切的代码片段放到 R 中并运行它，你可能还需要安装 tidyverse 包来运行代码：

install.packages("tidyverse")

完成此操作后，您应该能够逐行逐行执行代码以了解其正在执行的操作。

EDIT2：我决定修改您的代码以实现您正在寻找的训练集：

# Delete these lines
split <- sample.split(data, SplitRatio = 0.7)
split

# This should produce a dataset called 'training'
training <- sample_frac(data, size = 0.7)

数据拆分为训练数据和测试数据

1 个答案: