我有这个数据框:
x y freq E
1 10 15 100 6
2 20 25 100 5
3 30 35 100 1
4 40 45 100 23
5 50 55 100 11
6 60 65 100 13
7 70 75 100 27
8 80 85 100 30
9 90 95 100 15
10 100 105 100 28
由此,我想创建一个列表,每个成员必须包含随机选择的90%到80%的行数据(我认为sample_n
)。
我想要这样的东西(在下面的例子中,为简单起见,选择不是随机的):
$`90%`
a b E freq
1 10 15 6 100
2 20 25 5 100
3 30 35 1 100
4 40 45 23 100
5 50 55 11 100
6 60 65 13 100
7 70 75 27 100
8 80 85 30 100
9 90 95 15 100
$`80%`
a b E freq
1 10 15 6 100
2 20 25 5 100
3 30 35 1 100
4 40 45 23 100
5 50 55 11 100
6 60 65 13 100
7 70 75 27 100
8 80 85 30 100
答案 0 :(得分:4)
You could do:
library(dplyr)
list("80%" = sample_frac(df, .8), "90%" = sample_frac(df, .9))
(assuming your data frame is called df)
$`80%`
x y freq E
7 70 75 100 27
8 80 85 100 30
9 90 95 100 15
3 30 35 100 1
10 100 105 100 28
5 50 55 100 11
6 60 65 100 13
1 10 15 100 6
$`90%`
x y freq E
3 30 35 100 1
6 60 65 100 13
8 80 85 100 30
1 10 15 100 6
9 90 95 100 15
7 70 75 100 27
10 100 105 100 28
4 40 45 100 23
5 50 55 100 11
As suggested by Cath you can use sapply
with seq
to create a list of data frames ranging from 90 to 10 %:
sapply(seq(0.9, 0.1, -0.1),
function(pct) {df[sample(1:nrow(df), round(pct*nrow(df)), replace=FALSE), ]},
simplify=FALSE)
If you want to use sample_frac
you can modify her code like so:
sapply(seq(0.9, 0.1, -0.1),
function(pct) {sample_frac(df, pct)},
simplify=FALSE)