样本组并保留行顺序

时间:2017-07-05 23:44:28

标签: r dplyr tidyverse

我有一个数据框,例如:

df <- data.frame(id = factor(c(12321,12321,12321,4445,4445,4445,4445,787,787,787)),
                 word = c("please", "stop", "that", "the", "fox", "jumps", "that", "please", "eat", "noodles"),
                 word_id = c(12,5,28,99,214,800,28,12,78,912))

我正在尝试对数据框进行抽样,同时保留id组以及wordword_id顺序。

我尝试了newDF <- df %>% group_by(id) %>% sample_frac(0.33),但这会拍摄每组的样本。

我希望生成一个数据框,该数据框采用原始数据框中所有id个组的样本,并保留列的顺序。因此,如果我想采用df的33%样本,我将得到33%的id组,并且列保持有序。

newDF <- data.frame(id = factor(c(12321,12321,12321,4445,4445,4445,4445)),
                    word = c("please", "stop", "that", "the", "fox", "jumps", "that"),
                    word_id = c(12,5,28,99,214,800,28))              

1 个答案:

答案 0 :(得分:0)

添加alistaire的评论:

library(dplyr)
library(tidyr)

newDF1 <- df %>%
  group_by(id) %>% 
  nest() %>%
  sample_frac(1/3) %>% 
  unnest()

newDF2 <- anti_join(df, newDF1, by = "id")