如何从每个组具有不同行数的组中绘制n
行?
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
我已经尝试了,
library(dplyr)
outdat <- df %>%
group_by(color) %>%
sample_n(nrow(.), replace = TRUE)
outdat
但是这会返回一个data.frame,其中nrow(.)
是df的nrow而不是子集。
This SO post is close,但定义了特定数量的行绘制。我需要它专门针对dplyr中的组。
答案 0 :(得分:4)
另一种解决方法是使用sample_frac
:
outdat <- df %>%
group_by(color) %>%
sample_frac(1, replace = TRUE)
outdat
# # A tibble: 40 x 3
# # Groups: color [4]
# X1 X2 color
# <dbl> <dbl> <chr>
# 1 0.69256186 0.97180252 blue
# 2 1.54384827 -0.20268802 blue
# 3 -1.20068240 -0.45402013 blue
# 4 2.63407877 -0.31644247 blue
# 5 1.20716737 -0.91380874 blue
# 6 0.01067475 1.02004679 blue
# 7 0.01067475 1.02004679 blue
# 8 1.79732108 -0.04072946 blue
# 9 0.01067475 1.02004679 blue
# 10 1.79732108 -0.04072946 blue
# # ... with 30 more rows
此外,使用outdat %>% ungroup()
删除分组。
答案 1 :(得分:3)
使用slice
和sample.int
的另一种解决方案。
重用来自www:
outdat <- df %>%
group_by(color) %>%
slice(sample.int(n(),replace=T))
outdat
X1 X2 color
1 1.71506499 -1.12310858 blue
2 0.07050839 2.16895597 blue
3 0.46091621 -0.40288484 blue
4 0.07050839 2.16895597 blue
5 0.07050839 2.16895597 blue
6 1.71506499 -1.12310858 blue
7 -1.26506123 -0.46665535 blue
8 1.55870831 -1.26539635 blue
9 0.12928774 1.20796200 blue
10 1.55870831 -1.26539635 blue
11 0.55391765 -0.28477301 pink
12 -0.29507148 -2.30916888 pink
13 -0.30596266 0.18130348 pink
14 -0.06191171 -1.22071771 pink
15 0.55391765 -0.28477301 pink
16 0.55391765 -0.28477301 pink
17 0.87813349 -0.70920076 pink
18 0.68864025 1.02557137 pink
19 -0.30596266 0.18130348 pink
20 0.68864025 1.02557137 pink
21 0.70135590 0.12385424 red
22 0.11068272 1.36860228 red
23 -1.96661716 0.58461375 red
24 0.40077145 -0.04287046 red
25 1.78691314 1.51647060 red
26 -0.55584113 -0.22577099 red
27 0.40077145 -0.04287046 red
28 1.78691314 1.51647060 red
29 -0.47279141 0.21594157 red
30 -0.47279141 0.21594157 red
31 -1.02600445 -0.33320738 yellow
32 -0.72889123 -1.01857538 yellow
33 1.25381492 2.05008469 yellow
34 0.83778704 0.44820978 yellow
35 1.25381492 2.05008469 yellow
36 -0.62503927 -1.07179123 yellow
37 -0.62503927 -1.07179123 yellow
38 0.83778704 0.44820978 yellow
39 -0.21797491 -0.50232345 yellow
40 -1.68669331 0.30352864 yellow
答案 2 :(得分:2)
使用purrr
pakcage的变通方法。似乎sample_n
函数不能将n()
作为size参数,可能是因为该参数不采用矢量化输入。但是,如果我们将数据框架按color
分组,我们可以为每个组应用sample_n
nrow()
。
# Set seed for reproducibility
set.seed(123)
# Create example data frame
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
# Load packages
library(dplyr)
library(purrr)
outdat <- df %>%
# Split the data frame by color
split(.$color) %>%
# Apply the sample_n function to all data frames
map_dfr(~sample_n(., size = nrow(.), replace = TRUE))
outdat
# X1 X2 color
# 1 1.71506499 -1.12310858 blue
# 2 0.07050839 2.16895597 blue
# 3 0.46091621 -0.40288484 blue
# 4 0.07050839 2.16895597 blue
# 5 0.07050839 2.16895597 blue
# 6 1.71506499 -1.12310858 blue
# 7 -1.26506123 -0.46665535 blue
# 8 1.55870831 -1.26539635 blue
# 9 0.12928774 1.20796200 blue
# 10 1.55870831 -1.26539635 blue
# 11 0.55391765 -0.28477301 pink
# 12 -0.29507148 -2.30916888 pink
# 13 -0.30596266 0.18130348 pink
# 14 -0.06191171 -1.22071771 pink
# 15 0.55391765 -0.28477301 pink
# 16 0.55391765 -0.28477301 pink
# 17 0.87813349 -0.70920076 pink
# 18 0.68864025 1.02557137 pink
# 19 -0.30596266 0.18130348 pink
# 20 0.68864025 1.02557137 pink
# 21 0.70135590 0.12385424 red
# 22 0.11068272 1.36860228 red
# 23 -1.96661716 0.58461375 red
# 24 0.40077145 -0.04287046 red
# 25 1.78691314 1.51647060 red
# 26 -0.55584113 -0.22577099 red
# 27 0.40077145 -0.04287046 red
# 28 1.78691314 1.51647060 red
# 29 -0.47279141 0.21594157 red
# 30 -0.47279141 0.21594157 red
# 31 -1.02600445 -0.33320738 yellow
# 32 -0.72889123 -1.01857538 yellow
# 33 1.25381492 2.05008469 yellow
# 34 0.83778704 0.44820978 yellow
# 35 1.25381492 2.05008469 yellow
# 36 -0.62503927 -1.07179123 yellow
# 37 -0.62503927 -1.07179123 yellow
# 38 0.83778704 0.44820978 yellow
# 39 -0.21797491 -0.50232345 yellow
# 40 -1.68669331 0.30352864 yellow