从R

时间:2015-04-30 22:54:49

标签: r

我的数据框如下所示:

 col1        date
 23.2    2015-01-01
 23.2    2015-01-01
 22.1    2015-01-01
 01.2    2015-01-01
 11.9    2015-01-02
 12.7    2015-01-02
 23.2    2015-01-02
 12.4    2015-01-03
 23.7    2015-01-03
 34.3    2015-01-03
 73.4    2015-01-04
 83.2    2015-01-04
 91.2    2015-01-04

我需要随机选择'来自此数据框的样本,条件是每个采样行都来自一个日期,如下所示:

col1        date
22.1    2015-01-01
23.2    2015-01-02
23.7    2015-01-03
83.2    2015-01-04

所以我不关心哪一行被采样,我只是想确保每一行都有一个唯一的日期。

1 个答案:

答案 0 :(得分:1)

dd <- read.table(header = TRUE, text="col1        date
23.2    2015-01-01
23.2    2015-01-01
22.1    2015-01-01
01.2    2015-01-01
11.9    2015-01-02
12.7    2015-01-02
23.2    2015-01-02
12.4    2015-01-03
23.7    2015-01-03
34.3    2015-01-03
73.4    2015-01-04
83.2    2015-01-04
91.2    2015-01-04")

@ thelatemail的评论更优雅

dd[with(dd, tapply(rownames(dd),date,sample,1) ),]
#    col1       date
# 2  23.2 2015-01-01
# 6  12.7 2015-01-02
# 9  23.7 2015-01-03
# 13 91.2 2015-01-04

set.seed(1)
do.call('rbind', by(dd, dd$date, FUN = function(x)
  x[sample(seq.int(nrow(x)), 1), ]))
#            col1       date
# 2015-01-01 23.2 2015-01-01
# 2015-01-02 12.7 2015-01-02
# 2015-01-03 23.7 2015-01-03
# 2015-01-04 91.2 2015-01-04

set.seed(1)
tbl <- table(dd$date)
dd[unlist(Map(function(x) sample(seq.int(x), 1), tbl)) + cumsum(c(0, head(tbl, -1))), ]
#    col1       date
# 2  23.2 2015-01-01
# 6  12.7 2015-01-02
# 9  23.7 2015-01-03
# 13 91.2 2015-01-04

set.seed(1)
sp <- split(dd, dd$date)
do.call('rbind', lapply(sp, function(x) x[sample(seq.int(nrow(x)), 1), ]))
#            col1       date
# 2015-01-01 23.2 2015-01-01
# 2015-01-02 12.7 2015-01-02
# 2015-01-03 23.7 2015-01-03
# 2015-01-04 91.2 2015-01-04