从R中的子集创建随机数据

时间:2016-08-26 08:13:04

标签: r dataframe range

我有一个包含10行和5列的数据集。例如:

A <- c(15.0, 10.0, 5.50, 20, 22, 25, 30, 
         40, 50, 10.0)

B <- c(1, 30, 30, 6, 7, 10, 2, 25, 
         3, 27)

C <- c(1, 0, 0, 5, 15, 10, 20, 25, 
       30, 40)

D <- c(50, 100, 100, 500, 150, 100, 200, 250, 
       0, 0)

Date <- c("1997-05-01","1997-05-02","1997-05-03","1997-05-04","1997-05-05",
            "1997-05-06","1997-05-07","1997-05-08","1997-05-09","1997-05-10")

data <- data.frame(A, B, C, D, Date)

因此,我在R中有一个数据表:

  A      B      C     D      date
----    ----  ----   ----    ----
15.0      1      1     50    1997-05-01
10.0     20     0     100    1997-05-02
etc...

范围基于分位数。对于A,我希望<=将25(例如11.375)和B分类为>=分位数75(例如23.750)

quantile(data$A, c(.25, .50, .75))

quantile(data$B, c(.25, .50, .75))

一种方法是在这两个条件下过滤数据框:

data[data$A <= quantile(data$A, 0.25) &
        data$B >= quantile(data$B, 0.75), ]

所以,我想从3 rowa的这个子集中创建一个随机数据(具有相同数量的先前值,在本例中为10行),例如: 新数据将是:

  A      B      C     D      date
----    ----  ----   ----    ----
10.0     30     0     100    1997-05-02
5.5      30     0     100    1997-05-03
10.0     27     40     0     1997-05-10
5.5      30     0     100    1997-05-03
10.0     27     40     0     1997-05-10 
10.0     30     0     100    1997-05-02
10.0     27     40     0     1997-05-10
5.5      30     0     100    1997-05-03
10.0     27     40     0     1997-05-10
10.0     30     0     100    1997-05-02

如何做到最好?

谢谢!

2 个答案:

答案 0 :(得分:1)

一种以数学方式导向的方法,

d3 <- data[data$A <= quantile(data$A, 0.25) &
           data$B >= quantile(data$B, 0.75), ]

final_df <- rbind(d3[rep(seq_len(nrow(d3)), floor(nrow(data)/nrow(d3))),], 
                  d3[(1: (nrow(data) - floor(nrow(data)/nrow(d3))*nrow(d3))),])
rownames(final_df) <- NULL
final_df
#      A  B  C   D       Date
#1  10.0 30  0 100 1997-05-02
#2   5.5 30  0 100 1997-05-03
#3  10.0 27 40   0 1997-05-10
#4  10.0 30  0 100 1997-05-02
#5   5.5 30  0 100 1997-05-03
#6  10.0 27 40   0 1997-05-10
#7  10.0 30  0 100 1997-05-02
#8   5.5 30  0 100 1997-05-03
#9  10.0 27 40   0 1997-05-10
#10 10.0 30  0 100 1997-05-02

答案 1 :(得分:1)

也许你想要这样的东西?

d_filtered <- data[data$A <= quantile(data$A, 0.25) &
                     data$B >= quantile(data$B, 0.75), ]
d_new <- d_filtered[sample(1:nrow(d_filtered), nrow(data), replace = TRUE), ]
       A  B  C   D       Date
2    10.0 30  0 100 1997-05-02
3     5.5 30  0 100 1997-05-03
3.1   5.5 30  0 100 1997-05-03
3.2   5.5 30  0 100 1997-05-03
10   10.0 27 40   0 1997-05-10
3.3   5.5 30  0 100 1997-05-03
2.1  10.0 30  0 100 1997-05-02
2.2  10.0 30  0 100 1997-05-02
10.1 10.0 27 40   0 1997-05-10
2.3  10.0 30  0 100 1997-05-02