我有一个数据框,例如:
df <- data.frame(id = factor(c(12321,12321,12321,4445,4445,4445,4445,787,787,787)),
word = c("please", "stop", "that", "the", "fox", "jumps", "that", "please", "eat", "noodles"),
word_id = c(12,5,28,99,214,800,28,12,78,912))
我正在尝试对数据框进行抽样,同时保留id
组以及word
和word_id
顺序。
我尝试了newDF <- df %>% group_by(id) %>% sample_frac(0.33)
,但这会拍摄每组的样本。
我希望生成一个数据框,该数据框采用原始数据框中所有id
个组的样本,并保留列的顺序。因此,如果我想采用df
的33%样本,我将得到33%的id
组,并且列保持有序。
newDF <- data.frame(id = factor(c(12321,12321,12321,4445,4445,4445,4445)),
word = c("please", "stop", "that", "the", "fox", "jumps", "that"),
word_id = c(12,5,28,99,214,800,28))
答案 0 :(得分:0)
添加alistaire的评论:
library(dplyr)
library(tidyr)
newDF1 <- df %>%
group_by(id) %>%
nest() %>%
sample_frac(1/3) %>%
unnest()
newDF2 <- anti_join(df, newDF1, by = "id")