R中的多个随机抽样

时间:2017-03-08 16:11:20

标签: r multisampling

我目前有一个名为清算的数据框,我想从中运行30个随机样本,每个样本1000个,指定哪个帐户来自哪个样本,然后将其合并到一个新数据框中,所有30个样本合并:

以下是我在使用dplyr软件包进行随机抽样时手动完成的方法,但希望将其简化为可重复性:

Sample_1 <- liquidation %>%
  sample_n(1000)
Sample_1$Obs <- 1

Sample_2 <- liquidation %>%
  sample_n(1000)
Sample_2$Obs <- 2

Sample_3 <- liquidation %>%
  sample_n(1000)
Sample_3$Obs <- 3
....
Sample_30 <- liquidation %>%
  sample_n(1000)
Sample_30$Obs <- 30

然后我将它们组合成一个组合数据框:

Combined <- rbind(Sample_1, Sample_2,   Sample_3,   Sample_4,   Sample_5,   Sample_6,   Sample_7,   Sample_8,   Sample_9,   Sample_10,  
                  Sample_11,    Sample_12,  Sample_13,  Sample_14,  Sample_15,  Sample_16,  Sample_17,  Sample_18,  Sample_19,  
                  Sample_20,    Sample_21,  Sample_22,  Sample_23,  Sample_24,  Sample_25,  Sample_26,  Sample_27,  Sample_28,  
                  Sample_29,    Sample_30)

str(Combined)
'data.frame':   30000 obs. of  31 variables:

2 个答案:

答案 0 :(得分:3)

以下是使用 function dateConvert(dateobj,format){ var year = dateobj.getFullYear(); var month= ("0" + (dateobj.getMonth()+1)).slice(-2); var date = ("0" + dateobj.getDate()).slice(-2); var hours = ("0" + dateobj.getHours()).slice(-2); var minutes = ("0" + dateobj.getMinutes()).slice(-2); var seconds = ("0" + dateobj.getSeconds()).slice(-2); var day = dateobj.getDay(); var months = ["JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"]; var dates = ["SUN","MON","TUE","WED","THU","FRI","SAT"]; var converted_date = ""; switch(format){ case "YYYY-MM-DD": converted_date = year + "-" + month + "-" + date; break; case "YYYY-MMM-DD DDD": converted_date = year + "-" + months[parseInt(month)-1] + "-" + date + " " + dates[parseInt(day)]; break; } return converted_date; } var date = input.VIP_2bParsed; var format = "YYYY-MMM-DD DDD"; var converted_day = dateConvert(date,format); output={converted_day: converted_day} 的示例(随机选择5行,10次)

mtcars

我们使用基函数Combined <- bind_rows(replicate(10, mtcars %>% sample_n(5), simplify=F), .id="Obs") 多次重复采样。然后我们使用replicate()&#39; s dplyr合并样本并跟踪它们来自哪个样本。

答案 1 :(得分:1)

你应该能够把它包装成一个函数(假设Sample_20等是暂时的,你以后不需要它们)

sampling <- function(x, nSamples = 30, nRows = 1000) {
  do.call('rbind', lapply(seq_along(1:nSamples), function(n) {
    x %>% sample_n(nRows) %>% mutate(Obs=n)
  }))
}

然后可以运行:

combined <- sampling(liquidation)