我有一个包含10行和5列的数据集。例如:
A <- c(15.0, 10.0, 5.50, 20, 22, 25, 30,
40, 50, 10.0)
B <- c(1, 30, 30, 6, 7, 10, 2, 25,
3, 27)
C <- c(1, 0, 0, 5, 15, 10, 20, 25,
30, 40)
D <- c(50, 100, 100, 500, 150, 100, 200, 250,
0, 0)
Date <- c("1997-05-01","1997-05-02","1997-05-03","1997-05-04","1997-05-05",
"1997-05-06","1997-05-07","1997-05-08","1997-05-09","1997-05-10")
data <- data.frame(A, B, C, D, Date)
因此,我在R中有一个数据表:
A B C D date
---- ---- ---- ---- ----
15.0 1 1 50 1997-05-01
10.0 20 0 100 1997-05-02
etc...
范围基于分位数。对于A
,我希望<
或=
将25(例如11.375)和B
分类为>
或=
分位数75(例如23.750)
quantile(data$A, c(.25, .50, .75))
quantile(data$B, c(.25, .50, .75))
一种方法是在这两个条件下过滤数据框:
data[data$A <= quantile(data$A, 0.25) &
data$B >= quantile(data$B, 0.75), ]
所以,我想从3 rowa的这个子集中创建一个随机数据(具有相同数量的先前值,在本例中为10行),例如: 新数据将是:
A B C D date
---- ---- ---- ---- ----
10.0 30 0 100 1997-05-02
5.5 30 0 100 1997-05-03
10.0 27 40 0 1997-05-10
5.5 30 0 100 1997-05-03
10.0 27 40 0 1997-05-10
10.0 30 0 100 1997-05-02
10.0 27 40 0 1997-05-10
5.5 30 0 100 1997-05-03
10.0 27 40 0 1997-05-10
10.0 30 0 100 1997-05-02
如何做到最好?
谢谢!
答案 0 :(得分:1)
一种以数学方式导向的方法,
d3 <- data[data$A <= quantile(data$A, 0.25) &
data$B >= quantile(data$B, 0.75), ]
final_df <- rbind(d3[rep(seq_len(nrow(d3)), floor(nrow(data)/nrow(d3))),],
d3[(1: (nrow(data) - floor(nrow(data)/nrow(d3))*nrow(d3))),])
rownames(final_df) <- NULL
final_df
# A B C D Date
#1 10.0 30 0 100 1997-05-02
#2 5.5 30 0 100 1997-05-03
#3 10.0 27 40 0 1997-05-10
#4 10.0 30 0 100 1997-05-02
#5 5.5 30 0 100 1997-05-03
#6 10.0 27 40 0 1997-05-10
#7 10.0 30 0 100 1997-05-02
#8 5.5 30 0 100 1997-05-03
#9 10.0 27 40 0 1997-05-10
#10 10.0 30 0 100 1997-05-02
答案 1 :(得分:1)
也许你想要这样的东西?
d_filtered <- data[data$A <= quantile(data$A, 0.25) &
data$B >= quantile(data$B, 0.75), ]
d_new <- d_filtered[sample(1:nrow(d_filtered), nrow(data), replace = TRUE), ]
A B C D Date 2 10.0 30 0 100 1997-05-02 3 5.5 30 0 100 1997-05-03 3.1 5.5 30 0 100 1997-05-03 3.2 5.5 30 0 100 1997-05-03 10 10.0 27 40 0 1997-05-10 3.3 5.5 30 0 100 1997-05-03 2.1 10.0 30 0 100 1997-05-02 2.2 10.0 30 0 100 1997-05-02 10.1 10.0 27 40 0 1997-05-10 2.3 10.0 30 0 100 1997-05-02