Question

我有这个数据框：

     x   y freq  E
1   10  15  100  6
2   20  25  100  5
3   30  35  100  1
4   40  45  100 23
5   50  55  100 11
6   60  65  100 13
7   70  75  100 27
8   80  85  100 30
9   90  95  100 15
10 100 105  100 28

由此，我想创建一个列表，每个成员必须包含随机选择的90％到80％的行数据（我认为sample_n）。

我想要这样的东西（在下面的例子中，为简单起见，选择不是随机的）：

$`90%`
   a  b  E freq
1 10 15  6  100
2 20 25  5  100
3 30 35  1  100
4 40 45 23  100
5 50 55 11  100
6 60 65 13  100
7 70 75 27  100
8 80 85 30  100
9 90 95 15  100

$`80%`
   a  b  E freq
1 10 15  6  100
2 20 25  5  100
3 30 35  1  100
4 40 45 23  100
5 50 55 11  100
6 60 65 13  100
7 70 75 27  100
8 80 85 30  100

Answer 1

You could do:

library(dplyr)
list("80%" = sample_frac(df, .8), "90%" = sample_frac(df, .9))

(assuming your data frame is called df)

$`80%`
     x   y freq  E
7   70  75  100 27
8   80  85  100 30
9   90  95  100 15
3   30  35  100  1
10 100 105  100 28
5   50  55  100 11
6   60  65  100 13
1   10  15  100  6

$`90%`
     x   y freq  E
3   30  35  100  1
6   60  65  100 13
8   80  85  100 30
1   10  15  100  6
9   90  95  100 15
7   70  75  100 27
10 100 105  100 28
4   40  45  100 23
5   50  55  100 11

As suggested by Cath you can use sapply with seq to create a list of data frames ranging from 90 to 10 %:

sapply(seq(0.9, 0.1, -0.1), 
       function(pct) {df[sample(1:nrow(df), round(pct*nrow(df)), replace=FALSE), ]}, 
       simplify=FALSE)

If you want to use sample_frac you can modify her code like so:

sapply(seq(0.9, 0.1, -0.1), 
       function(pct) {sample_frac(df, pct)}, 
       simplify=FALSE)

在r

1 个答案: