随机将人员分配到不同大小的组和类别

时间:2016-04-22 11:06:27

标签: r

我需要将人们随机分配到组和类别中。不幸的是,我真的不知道从哪里开始。我试图用下面的例子来解释我的问题。任何有关这方面的帮助将非常感激。

我有 207 '家庭类型A'和 408 '家庭类型B'类别。总共有 1524 人需要分配到207家庭类型A或408家庭类型B的类别。但是,对于家庭类型A,1524人也需要组合在 2到7 组中,或者对于家庭类型B, 2到6 组合。

最终结果应该是1524人被分配到207组(包含2到7人)和408组(包含2到6人)。

对组的分配必须是随机的,并且可以使用所需的组大小的任意组合,因为如果不使用组类别则无关紧要(例如,如果仅包含家庭类型A的207个组,则会很好一次2,3或4人,或另外5人和7人)。

我想象一个看起来像这样的输出:

GroupSize <- c(2:7)
Num.Groups <- 0
Num.People <- 0
HouseTypeA <- data.frame(GroupSize, Num.Groups, Num.People)
GroupSize <- c(2:6)
HouseTypeB <- data.frame(GroupSize, Num.Groups, Num.People)

将'Num.Groups'列汇总为207或408,两个数据帧之间的'Num.People'之和为1524.

3 个答案:

答案 0 :(得分:1)

我分三步尝试了这个:

  1. 制作房屋类型清单
  2. 将每个人分配到房屋,检查 有空格(类型a最大值= 7,类型b最大值= 6)
  3. 检查每栋房屋是否至少有2人。如果没有,抓住一个人 来自另一所房子,不允许他们的人数下降 低于2.

    homeType=rep(c("a", "b"), times=c(207, 408))
    
    H <- vector(mode="list", length(homeType))
    for(i in seq(H)){
      H[[i]]$type <- homeType[i]
      H[[i]]$n <- 0
    }
    H
    
    # Place people in houses up to max number of people
    npeople <- 1524
    for(i in seq(npeople)){
      placed_in_house <- FALSE
      while(!placed_in_house){
        house_num <- sample(length(H), 1)
        if(H[[house_num]]$type == "a"){
          if(H[[house_num]]$n < 7){
            H[[house_num]]$n <- H[[house_num]]$n + 1
            placed_in_house <- TRUE
          }
        }
        if(H[[house_num]]$type == "b"){
          if(H[[house_num]]$n < 6){
            H[[house_num]]$n <- H[[house_num]]$n + 1
            placed_in_house <- TRUE
          }
        }
      }
    }
    H
    hist(unlist(lapply(H, function(x)x$n)))
    
    # move people around to get up to min number of people
    for(i in seq(H)){
      while(H[[i]]$n < 2){
        knock_on_door <- sample(length(H), 1)
        if( H[[knock_on_door]]$n > 2){
          H[[i]]$n <- H[[i]]$n + 1 # house i takes 1 person
          H[[knock_on_door]]$n <- H[[knock_on_door]]$n - 1 # house knock_on_door loses 1 person
        }
      }
    }
    H
    Ha <- H[which(lapply(H, function(x){x$type}) == "a")]
    Hb <- H[which(lapply(H, function(x){x$type}) == "b")]
    
    
    op <- par(mfcol=c(1,2))
    breaks=2:7
    hist(unlist(lapply(Ha, function(x)x$n)), breaks=breaks, col=8, xlab="people per house", main="type A") # 2:7
    hist(unlist(lapply(Hb, function(x)x$n)), breaks=breaks, col=8, xlab="people per house", main="type B") # 2:6
    par(op)
    
    sum(unlist(lapply(Ha, function(x)x$n))) + sum(unlist(lapply(Hb, function(x)x$n)))
    # [1] 1524
    
    
    Houses <- data.frame(
      Num.Groups = seq(H),
      type=unlist(lapply(H, function(x){x$type})),
      Num.People=unlist(lapply(H, function(x){x$n}))
    )
    head(Houses)
    

    正如您所看到的,每个房子的总人数和人口分布都有效。

  4. enter image description here

答案 1 :(得分:1)

有两个外部类别,A和B.外部类别A总是有207个内部类别,每个类别的大小为2:7,外部类别B总是有408个内部类别,每个类别的大小为2:6。

这意味着:

  • 外部类别A所需的最少人数必须为207*2 == 414
  • 外部类别A中可能的最大人数为207*7 == 1449
  • 外部类别B中所需的最少人数必须为408*2 == 816
  • 外部类别B中可能的最大人数为408*6 == 2448

我们可以进一步得出结论:

  • 总共需要的最少人数为414+816 == 1230
  • 最多可能的人数为1449+2448 == 3897

我已将这些值捕获为我在解决方案开始时指定的常量,并在此处复制以供参考:

Ainner <- 207L;
Binner <- 408L;
Amin <- Ainner*2L;
Bmin <- Binner*2L;
Amax <- Ainner*7L;
Bmax <- Binner*6L;
NPmin <- Amin+Bmin;
NPmax <- Amax+Bmax;

我设计了我的解决方案,我们希望尽量减少我们需要做的循环量,以便找到合适的人员分类。我们可以通过最初计算人口总数随机划分为两个外部类别A和B来更接近这一目标。

一旦完成,我们必须为每个内部类别选择随机大小,以便外部类别总数完全达到我们的分区数。这不是一件容易的事;我还没有想到使用PRNG原语来即时生成满足约束的值的方法。我想我们需要迭代。

我最小化循环的解决方案是导出正态分布,其分布参数(即平均值和标准偏差)在选定的外部类别大小(代码中的AsizeBsize)上进行参数化,以这样的方式设计,即分布可能产生其聚合将非常接近每个外部类别中所需总数的值,同时仍然在内部类别大小选择中提供显着的随机性。然后我们可以运行一个循环,根据需要在每次迭代时进行一次递增或递减,直到我们达到所需的总数。循环仍然是必要的,但迭代次数最小化。

均方程的形式遵循以下模式:

{extended-min} + {extended-range}*({size}-{min})/({max}-{min})
  • {extended-min}是范围的底端,它在内部类别大小范围之外延伸。例如,对于外部类别A,内部类别大小范围是2:7,我的扩展范围是1.5:7.5。我使用了扩展范围,因为我想允许在函数的最末端产生的随机变量有一些变化,即使它们理论上应该接近内部类别大小范围的末尾的极限。它使得公式的推导和操作比我试图使其在每一端达到理论上理想的极限时更容易,并且它在美学上更令人愉悦地以这种方式绘制它。请注意,正态分布偏离的可能性(事实上,确定性)将落在内部类别大小范围之外不是问题,因为我使用pmin()pmax()来限制端点处的值。 / LI>
  • {extended-range}是完整的扩展范围,例如外部类别A为6。
  • {size}是为外部类别随机选择的大小。
  • {min}{max}是外部类别大小的最小和最大允许值。

这些是我推导出的实际方程式:

Amean <- function(Asize) 1.5 + 6*(Asize-Amin)/(Amax-Amin);
Bmean <- function(Bsize) 1.5 + 5*(Bsize-Bmin)/(Bmax-Bmin);

标准差方程的形式遵循以下模式:

{mult1}*exp(-({mult2}*(2*{size}-{min}-{max})/({max}-{min}))^2)
  • {mult1}{mult2}只是我为了获得最终正态分布的直观行为而设计的乘数,基于将分布映射到所需内部类别分配的目标尽可能接近
  • {size}与以前相同。
  • {min}{max}与以前相同。

标准偏差形式的基本原理是标准偏差将关于有效尺寸范围的中点对称,产生中间地面尺寸的宽正态分布,并且朝向极端变得更窄。请注意,指数中的商等价于:

({size}-({min}+{max})/2)/(({max}-{min})/2)

因此,它的大小与内部类别大小范围的中点之间的偏差除以范围的一半。这提供了[-1,1]的域,然后乘以{mult2}并求平方。对于极值,得到的负指数变得非常大,导致整个指数变得非常小。这个小的标准偏差使得正态分布的窄度朝向内部类别大小范围的末端。

这些是实际的等式:

Asd <- function(Asize) 1.3*exp(-(1.22*(2*Asize-Amin-Amax)/(Amax-Amin))^2);
Bsd <- function(Bsize) 1.3*exp(-(1.22*(2*Bsize-Bmin-Bmax)/(Bmax-Bmin))^2);

这里有一些代码我写的很好地可视化正常的分布:

外部类别A

xlim <- c(-3,10);
ylim <- c(0,1.7);
xticks <- seq(xlim[1L],xlim[2L]);
yticks <- seq(ylim[1L],ylim[2L],0.1);
plot(NA,xlim=xlim,ylim=ylim,xlab='Inner Category Size',ylab='P',axes=F);
axis(1L,xticks,xticks);
axis(2L);
box();
abline(v=xticks,col='lightgrey');
abline(h=yticks,col='lightgrey');
x <- seq(xlim[1L],xlim[2L],0.01);
Asize.col <- data.frame(Asize=trunc(seq(Amin,Amax,len=7L)),col=c('red','green','blue','brown','gold','cyan','magenta'),stringsAsFactors=F);
for (ri in seq_len(nrow(Asize.col))) {
    Asize <- Asize.col$Asize[ri];
    col <- Asize.col$col[ri];
    lines(x,dnorm(x,Amean(Asize),Asd(Asize)),col=col,lwd=2);
};
with(Asize.col,legend(-2.5,1.65,Asize,col,col,title=expression(bold(Asize))),cex=0.7);
subEnv <- as.environment(mget(c('Amin','Amax')));
text(0.5,1.6,parse(text=paste0('mu == ',deparse(do.call(substitute,c(list(body(Amean)),subEnv))))),pos=4L);
text(0.5,1.53,parse(text=paste0('sigma == ',deparse(do.call(substitute,c(list(body(Asd)),subEnv))))),pos=4L);

normal-2-A

外部类别B

xlim <- c(-3,10);
ylim <- c(0,1.7);
xticks <- seq(xlim[1L],xlim[2L]);
yticks <- seq(ylim[1L],ylim[2L],0.1);
plot(NA,xlim=xlim,ylim=ylim,xlab='Inner Category Size',ylab='P',axes=F);
axis(1L,xticks,xticks);
axis(2L);
box();
abline(v=xticks,col='lightgrey');
abline(h=yticks,col='lightgrey');
x <- seq(xlim[1L],xlim[2L],0.01);
Bsize.col <- data.frame(Bsize=trunc(seq(Bmin,Bmax,len=7L)),col=c('red','green','blue','brown','gold','cyan','magenta'),stringsAsFactors=F);
for (ri in seq_len(nrow(Bsize.col))) {
    Bsize <- Bsize.col$Bsize[ri];
    col <- Bsize.col$col[ri];
    lines(x,dnorm(x,Bmean(Bsize),Bsd(Bsize)),col=col,lwd=2);
};
with(Bsize.col,legend(-2.5,1.65,Bsize,col,col,title=expression(bold(Bsize))),cex=0.7);
subEnv <- as.environment(mget(c('Bmin','Bmax')));
text(0.3,1.6,parse(text=paste0('mu == ',deparse(do.call(substitute,c(list(body(Bmean)),subEnv))))),pos=4L);
text(0.3,1.53,parse(text=paste0('sigma == ',deparse(do.call(substitute,c(list(body(Bsd)),subEnv))))),pos=4L);

normal-2-B

解决方案

## fixed constants
Ainner <- 207L;
Binner <- 408L;
Amin <- Ainner*2L;
Bmin <- Binner*2L;
Amax <- Ainner*7L;
Bmax <- Binner*6L;
NPmin <- Amin+Bmin;
NPmax <- Amax+Bmax;

## normal mean and sd functions
Amean <- function(Asize) 1.5 + 6*(Asize-Amin)/(Amax-Amin);
Asd <- function(Asize) 1.3*exp(-(1.22*(2*Asize-Amin-Amax)/(Amax-Amin))^2);
Bmean <- function(Bsize) 1.5 + 5*(Bsize-Bmin)/(Bmax-Bmin);
Bsd <- function(Bsize) 1.3*exp(-(1.22*(2*Bsize-Bmin-Bmax)/(Bmax-Bmin))^2);

## primary implementation function
bgoldst <- function(NP,seed=NULL,check=F) {

    if (!is.null(seed)) set.seed(seed);

    ## in order to parameterize the total number of ppl, must consider exactly which constraints impose which limits
    ## the A min 414 and max 1449 are fixed based on the 207 and 408 inner categories
    ## the B min 816 and max 2448 are also fixed for the same reason
    ## the mins cannot be changed by the parameterized total number of ppl
    ## moreover, we should validate that the total number of ppl is sufficient for all inner categories
    ## this requires 414+816 == 1230
    if (NP<NPmin) stop(paste0('insufficient NP=',NP,'.'));
    ## additionally we should validate that the total number of ppl does not exceed the maximum possible that can be handled by the inner categories
    ## this is 1449+2448 == 3897
    if (NP>NPmax) stop(paste0('excessive NP=',NP,'.'));
    ## the A max varies from 1449 down to 414, depending on NP
    ## the B max varies from 2448 down to 816, depending on NP
    ## so what we can do as the first step is calculate the maxes based on NP
    AminCur <- max(Amin,NP-Bmax);
    BminCur <- max(Bmin,NP-Amax);
    AmaxCur <- min(Amax,NP-Bmin);
    BmaxCur <- min(Bmax,NP-Amin);
    ## now we can select a random division from the available space
    Asize <- if (AminCur==AmaxCur) AminCur else sample(AminCur:AmaxCur,1L);
    Bsize <- NP-Asize;

    ## will use carefully designed sliding normal distributions to couple the probability distribution to the constraints
    ## see global functions for formulae

    ## randomly choose inner category sizes for A
    ## we know the exact number of inner categories we need, so choose that many inner category sizes using the normal dist
    AG <- pmin(7L,pmax(2L,as.integer(rnorm(Ainner,Amean(Asize),Asd(Asize)))));
    ## iterate adding/removing one member at a time to get to the required size
    AGsum <- sum(AG);
    if (AGsum>Asize) {
        while (AGsum>Asize) {
            i <- which(AG>2L);
            if (length(i)>1L) i <- sample(i,1L); ## don't let sample()'s inconsistency screw us
            AG[i] <- AG[i]-1L;
            AGsum <- AGsum-1L;
        }; ## end while
    } else if (AGsum<Asize) {
        while (AGsum<Asize) {
            i <- which(AG<7L);
            if (length(i)>1L) i <- sample(i,1L); ## don't let sample()'s inconsistency screw us
            AG[i] <- AG[i]+1L;
            AGsum <- AGsum+1L;
        }; ## end while
    }; ## end if

    ## randomly choose inner category sizes for B
    BG <- pmin(6L,pmax(2L,as.integer(rnorm(Binner,Bmean(Bsize),Bsd(Bsize)))));
    ## iterate adding/removing one member at a time to get to the required size
    BGsum <- sum(BG);
    if (BGsum>Bsize) {
        while (BGsum>Bsize) {
            i <- which(BG>2L);
            if (length(i)>1L) i <- sample(i,1L); ## don't let sample()'s inconsistency screw us
            BG[i] <- BG[i]-1L;
            BGsum <- BGsum-1L;
        }; ## end while
    } else if (BGsum<Bsize) {
        while (BGsum<Bsize) {
            i <- which(BG<6L);
            if (length(i)>1L) i <- sample(i,1L); ## don't let sample()'s inconsistency screw us
            BG[i] <- BG[i]+1L;
            BGsum <- BGsum+1L;
        }; ## end while
    }; ## end if

    ## combine into data.frame, randomly distributing the inner categories across inner category ids
    res <- data.frame(
        outer=rep(c('A','B'),c(Ainner,Binner)),
        inner=c(1:Ainner,1:Binner),
        num=c(sample(AG),sample(BG))
    );

    if (check) bgoldst.check(NP,res,seed);

    res;

}; ## end bgoldst()

## validation check helper function
bgoldst.check <- function(NP,res,seed=NULL) {
    seedStr <- if (is.null(seed)) 'NULL' else as.character(seed);
    ## A
    with(res[res$outer=='A',],{
        if (length(outer)!=Ainner) stop(paste0('outer category A has wrong number of inner categories ',length(outer),'!=',Ainner,' [',seedStr,'].'));
        x <- num>=2L & num<=7L;
        if (!all(x)) stop(paste0('outer category A has invalid inner category size ',num[which(!x)[1L]],' [',seedStr,'].'));
        x <- sum(num);
        if (!(x>=Amin && x<=Amax)) stop(paste0('outer category A has invalid size ',x,' [',seedStr,'].'));
    });
    ## B
    with(res[res$outer=='B',],{
        if (length(outer)!=Binner) stop(paste0('outer category B has wrong number of inner categories ',length(outer),'!=',Binner,' [',seedStr,'].'));
        x <- num>=2L & num<=6L;
        if (!all(x)) stop(paste0('outer category B has invalid inner category size ',num[which(!x)[1L]],' [',seedStr,'].'));
        x <- sum(num);
        if (!(x>=Bmin && x<=Bmax)) stop(paste0('outer category B has invalid size ',x,' [',seedStr,'].'));
    });
    ## all
    with(res,{
        x <- sum(num);
        if (x!=NP) stop(paste0('result has invalid total size ',x,' [',seedStr,'].'));
    });
}; ## end bgoldst.check()

## one-off demo
res <- bgoldst(1524L,1L,T);
head(res,10L); tail(res,10L);
##    outer inner num
## 1      A     1   2
## 2      A     2   3
## 3      A     3   3
## 4      A     4   2
## 5      A     5   2
## 6      A     6   2
## 7      A     7   2
## 8      A     8   4
## 9      A     9   2
## 10     A    10   2
##     outer inner num
## 606     B   399   3
## 607     B   400   2
## 608     B   401   4
## 609     B   402   2
## 610     B   403   2
## 611     B   404   2
## 612     B   405   6
## 613     B   406   2
## 614     B   407   2
## 615     B   408   5
table(res$outer,res$num);
##
##       2   3   4   5   6
##   A 158  28  13   8   0
##   B 282  68  33  18   7

## extensive testing
for (seed in seq_len(1e5L)) {
    print(seed);
    set.seed(seed);
    bgoldst(sample(NPmin:NPmax,1L),NULL,T);
}; ## end for
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
##
## ... snip ... (all succeed, all fast)
##
## [1] 99996
## [1] 99997
## [1] 99998
## [1] 99999
## [1] 100000

首次尝试

原创介绍: 好的,不幸的是,OP的措辞有些含糊不清,特别是他所说的&#34;如果是一个群组类别则无关紧要没有使用&#34;。我认为这意味着内部类别可能只有零成员。我的解决方案基于这个前提。我的假设是不正确的,这改变了一切。我将按照我写作的方式留下我的答案研究新的解决方案。

## primary implementation function
bgoldst <- function(seed=NULL,check=F) {

    if (!is.null(seed)) set.seed(seed);

    ## divide 1524 into two outer categories -- sample the acceptable divisions
    ## notably, cannot allow only 1 person into either outer category
    ## also, cannot take more than 1449 ppl into A; most it can hold is 7*207 == 1449
    ## B can hold any number from zero to 1524
    NHA <- sample(c(0L,2:1449),1L);
    NHB <- 1524L-NHA;

    ## also, since 1449 would *require* 7 ppl in every category, must prep normal dist params
    ## specifically, will slide mean from 4.5 over towards (and past) 7, sd smaller the closer we are to 1449
    Amean <- 4.5 + 3*NHA/1449;
    Asd <- 1.5*exp(-(NHA/1e3)^1.6);

    ## divide A into 207 inner categories of 2:7 -- iterative sampling until valid
    ## should be very few iterations, since we over-append with high likelihood
    AG <- integer();
    if (NHA>0L) {
        repeat {
            AG <- c(AG,pmin(7L,pmax(2L,as.integer(rnorm(max(1,NHA/3),Amean,Asd)))));
            ## find last inner category
            AGcum <- cumsum(AG);
            AGLastIndex <- which(AGcum>=NHA)[1L];
            if (!is.na(AGLastIndex)) { ## sufficient coverage
                ## also must guard against too lightly allocated to fit within inner category num limit
                if (AGLastIndex>207L) {
                    AG <- integer(); ## hard reset
                } else {
                    break; ## done
                }; ## end if
            }; ## end if
        }; ## end repeat
        ## remove surplus inner categories and surplus in last inner category
        length(AG) <- AGLastIndex;
        AG[AGLastIndex] <- AG[AGLastIndex] - (AGcum[AGLastIndex]-NHA);
        if (AG[AGLastIndex]==1L) { ## special case for last inner category remnant of 1; must even out against previous inner category
            ## also, can't join max inner category size since it would overflow the last group
            ## also, can't take 1 less than previous inner category size since then *it* would be left with 1
            takeOpt <- setdiff(1:if (AG[AGLastIndex-1L]==7L) 5L else AG[AGLastIndex-1L],AG[AGLastIndex-1L]-1L);
            take <- if (length(takeOpt)==1L) takeOpt else sample(takeOpt,1L); ## don't let sample()'s inconsistent behavior screw us
            AG[AGLastIndex-1L] <- AG[AGLastIndex-1L]-take;
            AG[AGLastIndex] <- AG[AGLastIndex]+take;
        }; ## end if
    }; ## end if

    ## divide Bs into 408 inner categories of 2:6 -- iterative sampling until valid
    BG <- integer();
    if (NHB>0L) {
        repeat {
            BG <- c(BG,sample(2:6,max(1,NHB/3),replace=T));
            ## find last inner category
            BGcum <- cumsum(BG);
            BGLastIndex <- which(BGcum>=NHB)[1L];
            if (!is.na(BGLastIndex)) { ## sufficient coverage
                ## also must guard against too lightly allocated to fit within inner category num limit
                if (BGLastIndex>408L) {
                    BG <- integer(); ## hard reset
                } else {
                    break; ## done
                }; ## end if
            }; ## end if
        }; ## end repeat
        ## remove surplus inner categories and surplus in last inner category
        length(BG) <- BGLastIndex;
        BG[BGLastIndex] <- BG[BGLastIndex] - (BGcum[BGLastIndex]-NHB);
        if (BG[BGLastIndex]==1L) { ## special case for last inner category remnant of 1; must even out against previous inner category
            ## also, can't join max inner category size since it would overflow the last group
            ## also, can't take 1 less than previous inner category size since then *it* would be left with 1
            takeOpt <- setdiff(1:if (BG[BGLastIndex-1L]==6L) 4L else BG[BGLastIndex-1L],BG[BGLastIndex-1L]-1L);
            take <- if (length(takeOpt)==1L) takeOpt else sample(takeOpt,1L); ## don't let sample()'s inconsistent behavior screw us
            BG[BGLastIndex-1L] <- BG[BGLastIndex-1L]-take;
            BG[BGLastIndex] <- BG[BGLastIndex]+take;
        }; ## end if
    }; ## end if

    ## combine into data.frame, randomly distributing the inner categories across inner category ids
    res <- data.frame(
        outer=rep(c('A','B'),c(207L,408L)),
        inner=c(1:207,1:408),
        num=c(sample(c(AG,rep(0L,207L-length(AG)))),sample(c(BG,rep(0L,408L-length(BG)))))
    );

    if (check) bgoldst.check(res,seed);

    res;

}; ## end bgoldst()

## validation check helper function
bgoldst.check <- function(res,seed=NULL) {
    seedStr <- if (is.null(seed)) 'NULL' else as.character(seed);
    ## A
    with(res[res$outer=='A',],{
        if (length(outer)!=207L) stop(paste0('outer category A has wrong number of inner categories ',length(outer),'!=207 [',seedStr,'].'));
        x <- num>=2L & num<=7L | num==0L;
        if (!all(x)) stop(paste0('outer category A has invalid inner category size ',num[which(!x)[1L]],' [',seedStr,'].'));
        x <- sum(num);
        if (!(x>=0L && x<=1524L)) stop(paste0('outer category A has invalid size ',x,' [',seedStr,'].'));
    });
    ## B
    with(res[res$outer=='B',],{
        if (length(outer)!=408L) stop(paste0('outer category B has wrong number of inner categories ',length(outer),'!=408 [',seedStr,'].'));
        x <- num>=2L & num<=6L | num==0L;
        if (!all(x)) stop(paste0('outer category B has invalid inner category size ',num[which(!x)[1L]],' [',seedStr,'].'));
        x <- sum(num);
        if (!(x>=0L && x<=1524L)) stop(paste0('outer category B has invalid size ',x,' [',seedStr,'].'));
    });
    ## all
    with(res,{
        x <- sum(num);
        if (x!=1524L) stop(paste0('result has invalid total size ',x,' [',seedStr,'].'));
    });
}; ## end bgoldst.check()

## one-off demo
res <- bgoldst(1L,T);
head(res,10L); tail(res,10L);
##    outer inner num
## 1      A     1   5
## 2      A     2   4
## 3      A     3   0
## 4      A     4   0
## 5      A     5   0
## 6      A     6   5
## 7      A     7   0
## 8      A     8   5
## 9      A     9   0
## 10     A    10   4
##     outer inner num
## 606     B   399   3
## 607     B   400   5
## 608     B   401   5
## 609     B   402   0
## 610     B   403   6
## 611     B   404   0
## 612     B   405   5
## 613     B   406   2
## 614     B   407   0
## 615     B   408   0
table(res$outer,res$num);
##
##       0   2   3   4   5   6   7
##   A 125   1   9  25  29  15   3
##   B 116  71  57  54  50  60   0

## extensive testing
for (seed in seq_len(1e5L)) {
    print(seed);
    bgoldst(seed,T);
}; ## end for
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
##
## ... snip ... (all succeed, all fast)
##
## [1] 99996
## [1] 99997
## [1] 99998
## [1] 99999
## [1] 100000

正态分布行为:

xlim <- c(-3,10);
ylim <- c(0,1.7);
xticks <- seq(xlim[1L],xlim[2L]);
yticks <- seq(ylim[1L],ylim[2L],0.1);
plot(NA,xlim=xlim,ylim=ylim,xlab='AG',ylab='P',axes=F);
axis(1L,xticks,xticks);
axis(2L);
box();
abline(v=xticks,col='lightgrey');
abline(h=yticks,col='lightgrey');
x <- seq(xlim[1L],xlim[2L],0.01);
Amean <- function(NHA) 4.5 + 3*NHA/1449;
Asd <- function(NHA) 1.5*exp(-(NHA/1e3)^1.6);
NHA.col <- data.frame(NHA=c(0,300,600,900,1200,1449),col=c('red','green','blue','gold','cyan','magenta'),stringsAsFactors=F);
for (ri in seq_len(nrow(NHA.col))) {
    NHA <- NHA.col$NHA[ri];
    col <- NHA.col$col[ri];
    lines(x,dnorm(x,Amean(NHA),Asd(NHA)),col=col,lwd=2);
};
with(NHA.col,legend(-2.5,1.65,NHA,col,col,title=expression(bold(NHA))),cex=0.7);
text(-2.5,0.92,parse(text=paste0('mu == ',deparse(body(Amean)))),pos=4L);
text(-2.5,0.87,parse(text=paste0('sigma == ',deparse(body(Asd)))),pos=4L);

normal-1

答案 2 :(得分:0)

我们可以为A类(207)和B类(408)中的每个类别分配2个人,为其他1524 - 2*207 - 2*408 = 294人分配样本类别

rm(list=ls())
options(stringsAsFactors=FALSE)
numPeeps <- 1524
aSize <- 207
bSize <- 408
aRange <- c(2,7)
bRange <- c(2,6)
vacancies <- c(paste(rep("A", diff(aRange)*aSize), rep(1:aSize, each=diff(aRange))),
    paste(rep("B", diff(bRange)*bSize), rep(1:bSize, each=diff(bRange))))

#perform allocation by assigning 2 people to each category in type A (207) and type B (408) and sample categories for the rest of the 1524-2*207-2*408=294 people
allocation <- c(paste(rep("A", aRange[1]*aSize), rep(1:aSize, each=aRange[1])),
    paste(rep("B", bRange[1]*bSize), rep(1:bSize, each=bRange[1])),
    sample(vacancies, numPeeps - aRange[1]*aSize - bRange[1]*bSize))

#spit out results
library(dplyr)
library(stringi)
df <- setNames(data.frame(stri_split_fixed(allocation, " ", simplify=T)), 
               c("AB","Category"))
df$Category <- as.integer(df$Category)
houseDensity <- group_by(df, AB, Category) %>% 
    summarise(GroupSize = n()) %>% 
    ungroup() %>%
    select(-Category) %>%
    group_by(AB, GroupSize) %>%
    summarise(Num.Groups = n()) %>%
    mutate(Num.People = Num.Groups * GroupSize)

# Source: local data frame [9 x 4]
# Groups: AB
# 
# AB GroupSize Num.Groups Num.People
# 1  A         2        113        226
# 2  A         3         68        204
# 3  A         4         21         84
# 4  A         5          4         20
# 5  A         6          1          6
# 6  B         2        266        532
# 7  B         3        118        354
# 8  B         4         22         88
# 9  B         5          2         10

sum(houseDensity[houseDensity$AB=="A", "Num.Groups"])
# [1] 207

sum(houseDensity[houseDensity$AB=="B", "Num.Groups"])
# [1] 408

sum(houseDensity$Num.People)
# [1] 1524