我的数据框如下所示:
col1 date
23.2 2015-01-01
23.2 2015-01-01
22.1 2015-01-01
01.2 2015-01-01
11.9 2015-01-02
12.7 2015-01-02
23.2 2015-01-02
12.4 2015-01-03
23.7 2015-01-03
34.3 2015-01-03
73.4 2015-01-04
83.2 2015-01-04
91.2 2015-01-04
我需要随机选择'来自此数据框的样本,条件是每个采样行都来自一个日期,如下所示:
col1 date
22.1 2015-01-01
23.2 2015-01-02
23.7 2015-01-03
83.2 2015-01-04
所以我不关心哪一行被采样,我只是想确保每一行都有一个唯一的日期。
答案 0 :(得分:1)
dd <- read.table(header = TRUE, text="col1 date
23.2 2015-01-01
23.2 2015-01-01
22.1 2015-01-01
01.2 2015-01-01
11.9 2015-01-02
12.7 2015-01-02
23.2 2015-01-02
12.4 2015-01-03
23.7 2015-01-03
34.3 2015-01-03
73.4 2015-01-04
83.2 2015-01-04
91.2 2015-01-04")
@ thelatemail的评论更优雅
dd[with(dd, tapply(rownames(dd),date,sample,1) ),]
# col1 date
# 2 23.2 2015-01-01
# 6 12.7 2015-01-02
# 9 23.7 2015-01-03
# 13 91.2 2015-01-04
或
set.seed(1)
do.call('rbind', by(dd, dd$date, FUN = function(x)
x[sample(seq.int(nrow(x)), 1), ]))
# col1 date
# 2015-01-01 23.2 2015-01-01
# 2015-01-02 12.7 2015-01-02
# 2015-01-03 23.7 2015-01-03
# 2015-01-04 91.2 2015-01-04
或
set.seed(1)
tbl <- table(dd$date)
dd[unlist(Map(function(x) sample(seq.int(x), 1), tbl)) + cumsum(c(0, head(tbl, -1))), ]
# col1 date
# 2 23.2 2015-01-01
# 6 12.7 2015-01-02
# 9 23.7 2015-01-03
# 13 91.2 2015-01-04
或
set.seed(1)
sp <- split(dd, dd$date)
do.call('rbind', lapply(sp, function(x) x[sample(seq.int(nrow(x)), 1), ]))
# col1 date
# 2015-01-01 23.2 2015-01-01
# 2015-01-02 12.7 2015-01-02
# 2015-01-03 23.7 2015-01-03
# 2015-01-04 91.2 2015-01-04