在data.table中沿组成员分配

时间:2016-04-19 05:56:59

标签: r data.table

我有一张需求表,看起来像这样:

set.seed(1)
DTd <- data.table(loc="L1", product="P1", cust=c("C1","C2","C3"), period=c("per1","per2","per3","per4"), qty=runif(12,min=0,max=100), key=c("loc","product","cust","period"))
DTd[]
#   loc product cust period      qty
#1:  L1      P1   C1   per1 12.97134
#2:  L1      P1   C1   per2 65.37663
#3:  L1      P1   C1   per3 34.21633
#4:  L1      P1   C1   per4 24.23550
#5:  L1      P1   C2   per1 85.68853
#6:  L1      P1   C2   per2 98.22407
#7:  L1      P1   C2   per3 92.24086
#8:  L1      P1   C2   per4 70.62672
#9:  L1      P1   C3   per1 62.12432
#10:  L1      P1   C3   per2 84.08788
#11:  L1      P1   C3   per3 82.67184
#12:  L1      P1   C3   per4 53.63538

供应表看起来像这样:

DTs <- data.table(loc="L1", product="P1", period=c("per1","per2","per3","per4"), qty=runif(4,min=0,max=200), key=c("loc","product","period"))
DTs[]
#   loc product period       qty
#1:  L1      P1   per1   9.23293
#2:  L1      P1   per2  74.03622
#3:  L1      P1   per3 133.54770
#4:  L1      P1   per4 123.43913

我需要优先为相应的需求分配供应,并添加一个列“已分配”的列。到需求表。出于这个例子的目的,我们假设优先级是最小的需求。

这是我正在寻找的结果。

#loc product cust period      qty     alloc
#1:  L1      P1   C1   per1 12.97134  9.232930
#2:  L1      P1   C1   per2 65.37663 65.376625
#3:  L1      P1   C1   per3 34.21633 34.216329
#4:  L1      P1   C1   per4 24.23550 24.235499
#5:  L1      P1   C2   per1 85.68853  0.000000
#6:  L1      P1   C2   per2 98.22407  0.000000
#7:  L1      P1   C2   per3 92.24086 16.659531
#8:  L1      P1   C2   per4 70.62672 45.568249
#9:  L1      P1   C3   per1 62.12432  0.000000
#10:  L1      P1   C3   per2 84.08788  8.659591
#11:  L1      P1   C3   per3 82.67184 82.671841
#12:  L1      P1   C3   per4 53.63538 53.635379

我没有看到使用data.table功能有效地做到这一点的方法。我似乎被简化为循环遍历行并逐行更新使用set。 这是我在这种情况下使用的代码。

#set key on demand to match supply and order by the qty (for prioritising
setkey(DTd, loc, product, period, qty)
#add a column for the allocated quantity
DTd[,alloc:=0]
#loop through the rows of the supply, using the row number
for (s in DTs[, .I]) {
    key <- DTs[s, .(loc, product, period)]
    suppqty <- DTs[s, qty]
    #loop through the corresponding demand and return the row number
    for (d in DTd[key, which=TRUE]) {
        if (suppqty == 0) break
        #determine the quantity to allocate from the demand row
        allocqty <- DTd[d, ifelse(qty < suppqty, qty, suppqty)]
        #update the alloc qty on this row
        set(DTd, d, 6L, allocqty)
        #reduce the amount outstanding
        suppqty <- suppqty - allocqty
    }
}
#restore the original keys
setkey(DTd, loc, product, cust, period)

任何有关更好地实现此任何部分的建议都会受到高度赞赏。 (在实践中,表格非常大,优先级规则可能非常复杂,但在这种情况下,我会先进行一次确定优先级,然后在分配传递中使用它。)

1 个答案:

答案 0 :(得分:3)

你可以做到

setnames(DTs, "qty", "suppqty")
setnames(DTd, "qty", "demqty")
setorder(DTd, loc, product, period, demqty) # put your priority column last here

DTd[DTs, alloc := {
  resid_supply = shift(pmax(suppqty - cumsum(demqty), 0), fill=suppqty[1L])
  pmin(demqty, resid_supply)
}, by=.EACHI, on=c("loc", "product", "period")]

结果是

    loc product cust period    demqty     alloc
 1:  L1      P1   C2   per1 20.168193 20.168193
 2:  L1      P1   C1   per1 26.550866 26.550866
 3:  L1      P1   C3   per1 62.911404 62.911404
 4:  L1      P1   C1   per2  6.178627  6.178627
 5:  L1      P1   C2   per2 37.212390 37.212390
 6:  L1      P1   C3   per2 89.838968 33.429727
 7:  L1      P1   C2   per3 20.597457 20.597457
 8:  L1      P1   C3   per3 57.285336 57.285336
 9:  L1      P1   C1   per3 94.467527 76.085490
10:  L1      P1   C3   per4 17.655675 17.655675
11:  L1      P1   C2   per4 66.079779 66.079779
12:  L1      P1   C1   per4 90.820779 15.804394

如某些软件包作者Arun, in this SO post所描述的那样,您现在通常不需要在合并之前设置密钥:

  

因此,在大多数情况下,不再需要设置密钥。我们建议尽可能使用on=,除非设置密钥在您要利用的性能方面有显着提升。

对于类似的计算(按最低价格优先购买),you can see my other answer