从最大值样本重新采样

时间:2016-04-21 08:06:53

标签: r statistics resampling

我有以下问题:
我有4个袋子,每个袋子有20个值,我从4个袋子中随机挑选10个样品:

for (i in 1:20){
  bag1[i] = sample(0:50,1)
  bag2[i] = sample(0:50,1)
  bag3[i] = sample(0:50,1)
  bag4[i] = sample(0:50,1)
}

for (j in 1:10){
    samp=sample(1:20,1)
    bag1value=bag1value+bag1[samp]
    bag2value=bag2value+bag2[samp]
    bag3value=bag3value+bag3[samp]
    bag4value=bag4value+bag4[samp]
}

现在,我想再次从第一个样本中具有最大值的包中取样10个值。所以我可以这样做:

maxbag=max(bag1value,bag2value,bag3value,bag4value)   
if (maxbag==bag1value){ 
    for (j1 in 1:10){
      samp=sample(1:20,1)
      secondsample=secondsample+bag1[samp]
    } elseif (maxbag==bag2value){
        samp=sample(1:20,1)
        secondsample=secondsample+bag2[samp]
     }

但我正在寻找一种更优雅的方式来做到这一点。

1 个答案:

答案 0 :(得分:1)

您的代码目前无法使用。参数j和j1不存在于导出包值和秒样本的两个for循环中。

无论如何,处理数据的更优雅方式是使用列表或数组。第一个循环可以替换为下面的数组“bag”,列1:4代表包1到4:

bags<-sapply(1:4, function(x) sample(1:50, 20, replace=T))
colnames(bags) <- paste0("bag", 1:4)
head(bags) 

     bag1 bag2 bag3 bag4
[1,]    7    1   14   16
[2,]   50   23   49    7
[3,]   14   48   26   10
[4,]   42   11    8   10
[5,]   31   43   11    9
[6,]    5   20   27   19

从每个包中取出10个:

new <- sapply(colnames(bags), function(x)sample(bags[,x], 10, replace=F))
head(new)

     bag1 bag2 bag3 bag4
[1,]   14    1   49    2
[2,]   31   26   13   18
[3,]    1   48   14    9
[4,]   38   23   27    6
[5,]   24   23   26   10
[6,]   14   42    8   29

确定哪个行李包含最大值:

max.new <- sapply(1:4, function(x) max(new[,x]))
max.new

[1] 38 48 49 29

max.bag <- colnames(bags)[max.new==max(max.new)]

最大值的袋子重新取样:

secondsample <- sample(bags[,max.bag], 10)
secondsample

[1]  8 13 27 14 31 13 49 29 38  5