随机将值分配到满足多个标准的不同大小组的数据帧/矩阵中

时间:2016-08-16 13:07:47

标签: r

这是前一个问题I asked的后续内容,但增加了一层额外的复杂性,因此是一个新问题。

我在下面的示例中有两个组( 39 380 )。我需要做的是将 889 人分配到由 2到7 人组成的39组中,以及由 2到6 人。

但是,对某些组中的人员总数有限制。在下面的示例中,每行允许的最大值在X6列中。

使用以下示例。如果在第2行中,在第X2列分配了6个人,在第X4列分配了120个人,那么总人数将是18(6 * 3)+240(120 * 2)= 258,所以这样就可以了。不满324岁。

因此,对于每一行,我所追求的是X1 * X2 + X3 * X4(使列X5)的值小于或等于X6,其中X2的总和为39,X4的总和为380和X5的总和为889.理想情况下,任何解决方案都应尽可能随机(如果可能的话,如果重复则会得到不同的解决方案),以及当值不同于889,39和380时可以使用的解决方案。

谢谢!

DF <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF[,1] <- c(2:7,"Sum")
DF[7,2] <- 39
DF[2:6,3] <- 2:6
DF[7,4] <- 380
DF[7,5] <- 889
DF[1:6,6] <- c(359, 324, 134, 31, 5, 2)
DF[1,3:4] <- NA
DF[7,3] <- NA
DF[7,6] <- NA

修改

我的问题的措辞可能不是最清楚的。以下是我目前使用的代码示例以及它不符合我上面设置的条件

homeType=rep(c("a", "b"), times=c(39, 380))
H <- vector(mode="list", length(homeType))
for(i in seq(H)){
  H[[i]]$type <- homeType[i]
  H[[i]]$n <- 0
}

# Place people in houses up to max number of people
npeople <- 889
for(i in seq(npeople)){
  placed_in_house <- FALSE
  while(!placed_in_house){
    house_num <- sample(length(H), 1)
    if(H[[house_num]]$type == "a"){
      if(H[[house_num]]$n < 7){
        H[[house_num]]$n <- H[[house_num]]$n + 1
        placed_in_house <- TRUE
      }
    }
    if(H[[house_num]]$type == "b"){
      if(H[[house_num]]$n < 6){
        H[[house_num]]$n <- H[[house_num]]$n + 1
        placed_in_house <- TRUE
      }
    }
  }
}

# move people around to get up to min number of people
for(i in seq(H)){
  while(H[[i]]$n < 2){
    knock_on_door <- sample(length(H), 1)
    if( H[[knock_on_door]]$n > 2){
      H[[i]]$n <- H[[i]]$n + 1 # house i takes 1 person
      H[[knock_on_door]]$n <- H[[knock_on_door]]$n - 1 # house knock_on_door loses 1 person
    }
  }
}

Ha <- H[which(lapply(H, function(x){x$type}) == "a")]
Hb <- H[which(lapply(H, function(x){x$type}) == "b")]

Ha_T <- data.frame(t(table(data.frame(matrix(unlist(Ha), nrow=length(Ha), byrow=T)))))
Hb_T <- data.frame(t(table(data.frame(matrix(unlist(Hb), nrow=length(Hb), byrow=T)))))

DF_1 <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF_1[,1] <- c(2:7,"Sum")
DF_1[7,2] <- 39
DF_1[2:6,3] <- 2:6
DF_1[7,4] <- 380
DF_1[7,5] <- 889
DF_1[1:6,6] <- c(359, 324, 134, 31, 5, 2)
for(i in 1:nrow(Ha_T)){DF_1[as.numeric(as.character(Ha_T[i,1]))-1,2] <- Ha_T[i,3]}
for(i in 1:nrow(Hb_T)){DF_1[as.numeric(as.character(Hb_T[i,1])),4] <- Hb_T[i,3]}
DF_1$X5[1:6] <- (as.numeric(as.character(DF_1$X1[1:6]))*DF_1$X2[1:6])+(as.numeric(as.character(DF_1$X3[1:6]))*DF_1$X4[1:6])
DF_1$X7 <- DF_1$X2+DF_1$X4
DF_1[1,3:4] <- NA
DF_1[7,3] <- NA
DF_1[7,6] <- NA

使用此示例,问题是DF_1中的第2行。列X7(X2 + X4)中的值大于列X6中显示的允许数量。我需要的是一个解决方案,其中X7中的值小于或等于X6中的值,但列X2,X4和X5(X1 * X2 + X3 * X4)的总和分别等于39,380和889(尽管这些数字会根据使用的数据而变化。

2 个答案:

答案 0 :(得分:1)

问题中问题的原始描述是不可能满足的,因为没有值可以满足所有这些约束。

  

&#34;所以我追求的每行是一个X1 * X2 + X3 * X4的值(制作   列X5)小于或等于X6,X2之和为39,   X4的总和为380,X5的总和为889.&#34;

然而,在评论中重述问题后,修订后的问题描述可以解决如下。

更新:解决方案基于澄清评论中的问题

根据评论中的澄清

  

&#34;我实际上并没有完全填满房屋数量。我只是把孩子的数量分配到房子里。这就是为什么   &#39;一个&#39;是2到7和&#39; b&#39;是2到6,因为&#39; a&#39;家庭也将包括1   成人和&#39; b&#39;家庭2.对于给定的区域,我知道有多少2到8个   人家有(419),有多少2,3,4,5,6,7或8   有人住户(359,324,134,31,5,2)。我也知道总数   拥有1(39)或2(380)名成人的家庭数量,以及如何   那里有很多孩子(在我的例子中是889)。&#34;

根据这些更新的信息,我们可以执行以下操作,其中我们循环1)计算根据标准可以分配多少个类型的房屋,2)随机选择一个仍然可以的房屋类型在没有违反规则3)的情况下分配并重复直到所有889名儿童都在房子里。请注意,我在这里使用了更多描述性的列名,以便更容易遵循逻辑:

DT <- data.table(HS1 = 2:7, # type 1 house size
                 NH1 = 0,   # number of type 1 houses with children
                 HS2 = 1:6, # type 2 house size
                 NH2 = 0,   # number of type 2 houses with children
                 C = 0,     # number of children in houses
                 MaxNH = c(359, 324, 134, 31, 5, 2)) # maximum number of type1+type 2 houses
NR = DT[,.N]
set.seed(1234)
repeat {
  while (DT[, sum(C) < 889]) {
    DT[, MaxH1 := (MaxNH - NH1 - NH2)]
    DT[, MaxH2 := (MaxNH - NH1 - NH2)]
    DT[1,MaxH2 := 0 ]
    DT[MaxH1 > 39 - sum(NH1), MaxH1 := 39 - sum(NH1)]
    DT[MaxH2 > 380- sum(NH2), MaxH2 := 380- sum(NH2)]
    if (DT[, sum(NH1)] >= 39)  DT[, MaxH1 := 0]
    if (DT[, sum(NH2)] >= 380) DT[, MaxH1 := 0]

    if (DT[, all(MaxH1==0) & all(MaxH2==0)]) { # check if it is not possible to assign anyone else to a group
      print("No solution found. Check constraints or try again")
      break
    }
    # If you wish to preferentially fill a particular type of house, then change the probability weights in the next line accordingly
    newgroup = sample(2*NR, 1, prob = DT[, c(MaxH1, MaxH2)])
    if (newgroup > NR) DT[rep(1:NR, 2)[newgroup], NH2 := NH2+1] else DT[rep(1:NR, 2)[newgroup], NH1 := NH1+1]

    DT[, C := HS1*NH1 + HS2*NH2]
  }
  if (DT[, sum(C)==889]) break
}

DT[,1:6, with=F]
#   HS1 NH1 HS2 NH2   C MaxNH 
#1:   2   7   1   0  14   359 
#2:   3   7   2 218 457   324 
#3:   4  14   3  76 284   134  
#4:   5   9   4  14 101    31  
#5:   6   2   5   3  27     5 
#6:   7   0   6   1   6     2 

colSums(DT[, .(NH1, NH2, C)])
# NH1 NH2   C 
#  39 312 889  

答案 1 :(得分:0)

此代码提供检查生成的数据是否符合条件的检查。每次迭代,它都会停止让用户决定继续尝试。对我来说,选择过程从未下降到348个b房子,每个人2人,因此结果总是违反第二个条件(少于324个房屋)。 a和b房屋类型是否应该在df中偏移?

df <- data.frame(a=2:7, afreq=0, b=c(0,2:6), bfreq=0, housed=0, houses=500, correct=c(359, 324, 134, 31, 5, 2))

H <- data.frame(type=homeType, n=0) # using df instead of lists, easier for me

npeople <- 889

while(any(df$houses > df$correct)){

    H <- data.frame(type=homeType, n=0)

    # This code is yours, changed to df
    for(i in 1:npeople){
      placed_in_house <- FALSE
      while(!placed_in_house){
        house_num <- sample(nrow(H), 1)
        if(H$type[house_num] == "a"){
          if(H$n[house_num] < 7){
            H$n[house_num] <- H$n[house_num] + 1
            placed_in_house <- TRUE
          }
        }
        if(H$type[house_num] == "b"){
          if(H$n[house_num] < 6){
            H$n[house_num] <- H$n[house_num] + 1
            placed_in_house <- TRUE
          }
        }
      }
    }

    # Subsets of houses with lack of people and possible sources
    # This is iterative to randomize the full dataset
    Hempty <- which(H$n < 2)
    Hfull <- which(H$n >= 2)
    k <- 1 # effort counter

    while(length(Hempty) > 0){
        for(hempty in Hempty){
            knock_on_door <- sample(Hfull, 1)
            H$n[knock_on_door] <- H$n[knock_on_door] - 1 # moves from a full house
            H$n[hempty] <- H$n[hempty] + 1 # moves into an empty house
        }
        Hempty <- which(H$n < 2)
        Hfull <- which(H$n >= 2)
        print(paste("Iteration:", k, ", remaining empty houses:", length(Hempty)))
        k <- k + 1
    }

    # Frequencies how many houses house how many people
    freqs <- data.frame(table(H))
    df$afreq[match(freqs$n[freqs$type == "a"], df$a)] <- freqs$Freq[freqs$type == "a"]
    df$bfreq[match(freqs$n[freqs$type == "b"], df$b)] <- freqs$Freq[freqs$type == "b"]

    df$housed <- df[,1]*df[,2] + df[,3]*df[,4]
    df$houses <- df$afreq + df$bfreq

    # Check what is wrong with the occupancy and let user have a say
    print(df)
    if(any(df$houses > df$correct)){
        readline("There are more houses with a number of occupants than permitter. Hit [enter]")
    }
}