场景是有n
个不同大小的对象,不均匀地分布在m
个桶上。存储桶的大小是它包含的所有对象大小的总和。现在碰巧桶的大小变化很大。
如果我想将这些对象均匀地分布在这些存储桶上,以便每个存储桶的总大小大致相同,那么什么是一个好的算法呢?如果算法倾向于在完全均匀的传播中减少移动大小,那将是很好的。
我在Ruby中拥有这种天真,无效和错误的解决方案。
buckets = [ [10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2] ]
avg_size = buckets.flatten.reduce(:+) / buckets.count + 1
large_buckets = buckets.take_while {|arr| arr.reduce(:+) >= avg_size}.to_a
large_buckets.each do |large|
smallest = buckets.last
until ((small_sum = smallest.reduce(:+)) >= avg_size)
break if small_sum + large.last >= avg_size
smallest << large.pop
end
buckets.insert(0, buckets.pop)
end
=> [[3, 1, 1, 1, 2, 3], [2, 1, 2, 3, 3], [10, 4], [5, 5]]
答案 0 :(得分:16)
我认为这是bin packing problem的变体,因此它是NP难的。你的答案本质上是第一个适合减少启发式的变体,这是一个非常好的启发式。也就是说,我相信以下内容会带来更好的结果。
我们的想法是,通过移动Pass1中最大的元素,您可以更轻松地在Pass2中更精确地匹配存储桶的大小。您使用平衡二叉树,以便在删除或添加元素后可以快速重新索引存储桶或存储桶树,但您可以使用链接列表(平衡二叉树将具有更好的最坏情况性能但链接列表可能有更好的平均情况表现)。通过在Pass2中执行最佳拟合而不是第一次拟合,您不太可能执行无用的移动(例如,将大小为10的对象从比平均值大5的桶中移动到比平均值低5的桶中 - 首先适合会盲目地执行这部电影,最适合的方法是查询下一个“太大的桶”以获得更大尺寸的对象,否则会从桶树中删除“太小的桶”。)
答案 1 :(得分:7)
我最终得到了这样的东西。
Ruby代码示例
require 'pp'
def average_size(buckets)
(buckets.flatten.reduce(:+).to_f / buckets.count + 0.5).to_i
end
def spread_evenly(buckets)
average = average_size(buckets)
large_buckets = buckets.take_while {|arr| arr.reduce(:+) >= average}.to_a
large_buckets.each do |large_bucket|
smallest_bucket = buckets.last
smallest_size = smallest_bucket.reduce(:+)
large_size = large_bucket.reduce(:+)
until (smallest_size >= average)
break if large_size <= average
if smallest_size + large_bucket.last > average and large_size > average
buckets.unshift buckets.pop
smallest_bucket = buckets.last
smallest_size = smallest_bucket.reduce(:+)
end
smallest_size += smallest_object = large_bucket.pop
large_size -= smallest_object
smallest_bucket << smallest_object
end
buckets.unshift buckets.pop if smallest_size >= average
end
buckets
end
test_buckets = [
[ [10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2] ],
[ [4, 3, 3, 2, 2, 2, 2, 1, 1], [10, 5, 3, 2, 1], [3, 3, 3], [6] ],
[ [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1], [1, 1] ],
[ [10, 9, 8, 7], [6, 5, 4], [3, 2], [1] ],
]
test_buckets.each do |buckets|
puts "Before spread with average of #{average_size(buckets)}:"
pp buckets
result = spread_evenly(buckets)
puts "Result and sum of each bucket:"
pp result
sizes = result.map {|bucket| bucket.reduce :+}
pp sizes
puts
end
输出:
Before spread with average of 12:
[[10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2]]
Result and sum of each bucket:
[[3, 1, 1, 4, 1, 2], [2, 1, 2, 3, 3], [10], [5, 5, 3]]
[12, 11, 10, 13]
Before spread with average of 14:
[[4, 3, 3, 2, 2, 2, 2, 1, 1], [10, 5, 3, 2, 1], [3, 3, 3], [6]]
Result and sum of each bucket:
[[3, 3, 3, 2, 3], [6, 1, 1, 2, 2, 1], [4, 3, 3, 2, 2], [10, 5]]
[14, 13, 14, 15]
Before spread with average of 4:
[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1], [1, 1]]
Result and sum of each bucket:
[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
[4, 4, 4, 4, 4]
Before spread with average of 14:
[[10, 9, 8, 7], [6, 5, 4], [3, 2], [1]]
Result and sum of each bucket:
[[1, 7, 9], [10], [6, 5, 4], [3, 2, 8]]
[17, 10, 15, 13]
答案 2 :(得分:6)
这不是其他人建议的垃圾箱包装。箱子的大小是固定的,你试图最小化数量。在这里,您尝试最小化固定数量的箱子之间的差异。
事实证明这相当于Multiprocessor Scheduling,并且 - 根据参考 - 下面的算法(称为“最长作业优先”或“最长处理时间优先”)肯定会产生最大的总和否超过最佳的4/3 - 1 /(3m)倍,其中m是桶的数量。在测试案例中,我们有4 / 3-1 / 12 = 5/4或不超过25%的最佳值。
我们只是从所有垃圾箱开始,并将每个项目按大小递减放入当前最不完整的垃圾箱。我们可以使用最小堆有效地跟踪最少的完整bin。对于具有O(log n)insert和deletemin的堆,该算法具有O(n log m)时间(n和m定义为@JonasElfström说)。 Ruby在这里非常具有表现力:算法本身只有9个sloc。
这是代码。我不是Ruby专家,所以请随意提出更好的方法。我正在使用@JonasElfström的测试用例。
require 'algorithms'
require 'pp'
test_buckets = [
[ [10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2] ],
[ [4, 3, 3, 2, 2, 2, 2, 1, 1], [10, 5, 3, 2, 1], [3, 3, 3], [6] ],
[ [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1], [1, 1] ],
[ [10, 9, 8, 7], [6, 5, 4], [3, 2], [1] ],
]
def relevel(buckets)
q = Containers::PriorityQueue.new { |x, y| x < y }
# Initially all buckets to be returned are empty and so have zero sums.
rtn = Array.new(buckets.length) { [] }
buckets.each_index {|i| q.push(i, 0) }
sums = Array.new(buckets.length, 0)
# Add to emptiest bucket in descending order.
# Bang! ops would generate less garbage.
buckets.flatten.sort.reverse.each do |val|
i = q.pop # Get index of emptiest bucket
rtn[i] << val # Append current value to it
q.push(i, sums[i] += val) # Update sums and min heap
end
rtn
end
test_buckets.each {|b| pp relevel(b).map {|a| a.inject(:+) }}
结果:
[12, 11, 11, 12]
[14, 14, 14, 14]
[4, 4, 4, 4, 4]
[13, 13, 15, 14]
答案 3 :(得分:3)
您可以使用我对fitting n variable height images into 3 (similar length) column layout的回答。
智力地图:
然后该解决方案的其余部分应该适用......
以下使用Robin Green先前提到的first_fit算法,但后来通过贪婪交换对此进行了改进。
交换例程找到离平均列高度最远的列,然后系统地查找其中一个图片与另一列中的第一张图片之间的交换,以最小化与平均值的最大偏差。
我使用了30张图片的随机样本,其高度在5到50'单位范围内。在我的情况下,convergenge很快,并且在first_fit算法上得到了显着改善。
代码(Python 3.2:
def first_fit(items, bincount=3):
items = sorted(items, reverse=1) # New - improves first fit.
bins = [[] for c in range(bincount)]
binsizes = [0] * bincount
for item in items:
minbinindex = binsizes.index(min(binsizes))
bins[minbinindex].append(item)
binsizes[minbinindex] += item
average = sum(binsizes) / float(bincount)
maxdeviation = max(abs(average - bs) for bs in binsizes)
return bins, binsizes, average, maxdeviation
def swap1(columns, colsize, average, margin=0):
'See if you can do a swap to smooth the heights'
colcount = len(columns)
maxdeviation, i_a = max((abs(average - cs), i)
for i,cs in enumerate(colsize))
col_a = columns[i_a]
for pic_a in set(col_a): # use set as if same height then only do once
for i_b, col_b in enumerate(columns):
if i_a != i_b: # Not same column
for pic_b in set(col_b):
if (abs(pic_a - pic_b) > margin): # Not same heights
# new heights if swapped
new_a = colsize[i_a] - pic_a + pic_b
new_b = colsize[i_b] - pic_b + pic_a
if all(abs(average - new) < maxdeviation
for new in (new_a, new_b)):
# Better to swap (in-place)
colsize[i_a] = new_a
colsize[i_b] = new_b
columns[i_a].remove(pic_a)
columns[i_a].append(pic_b)
columns[i_b].remove(pic_b)
columns[i_b].append(pic_a)
maxdeviation = max(abs(average - cs)
for cs in colsize)
return True, maxdeviation
return False, maxdeviation
def printit(columns, colsize, average, maxdeviation):
print('columns')
pp(columns)
print('colsize:', colsize)
print('average, maxdeviation:', average, maxdeviation)
print('deviations:', [abs(average - cs) for cs in colsize])
print()
if __name__ == '__main__':
## Some data
#import random
#heights = [random.randint(5, 50) for i in range(30)]
## Here's some from the above, but 'fixed'.
from pprint import pprint as pp
heights = [45, 7, 46, 34, 12, 12, 34, 19, 17, 41,
28, 9, 37, 32, 30, 44, 17, 16, 44, 7,
23, 30, 36, 5, 40, 20, 28, 42, 8, 38]
columns, colsize, average, maxdeviation = first_fit(heights)
printit(columns, colsize, average, maxdeviation)
while 1:
swapped, maxdeviation = swap1(columns, colsize, average, maxdeviation)
printit(columns, colsize, average, maxdeviation)
if not swapped:
break
#input('Paused: ')
输出:
columns
[[45, 12, 17, 28, 32, 17, 44, 5, 40, 8, 38],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 34, 9, 37, 44, 30, 20, 28]]
colsize: [286, 267, 248]
average, maxdeviation: 267.0 19.0
deviations: [19.0, 0.0, 19.0]
columns
[[45, 12, 17, 28, 17, 44, 5, 40, 8, 38, 9],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 34, 37, 44, 30, 20, 28, 32]]
colsize: [263, 267, 271]
average, maxdeviation: 267.0 4.0
deviations: [4.0, 0.0, 4.0]
columns
[[45, 12, 17, 17, 44, 5, 40, 8, 38, 9, 34],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 37, 44, 30, 20, 28, 32, 28]]
colsize: [269, 267, 265]
average, maxdeviation: 267.0 2.0
deviations: [2.0, 0.0, 2.0]
columns
[[45, 12, 17, 17, 44, 5, 8, 38, 9, 34, 37],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 44, 30, 20, 28, 32, 28, 40]]
colsize: [266, 267, 268]
average, maxdeviation: 267.0 1.0
deviations: [1.0, 0.0, 1.0]
columns
[[45, 12, 17, 17, 44, 5, 8, 38, 9, 34, 37],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 44, 30, 20, 28, 32, 28, 40]]
colsize: [266, 267, 268]
average, maxdeviation: 267.0 1.0
deviations: [1.0, 0.0, 1.0]
好问题。
下面我单独评论中提到的反向排序信息。
>>> h = sorted(heights, reverse=1)
>>> h
[46, 45, 44, 44, 42, 41, 40, 38, 37, 36, 34, 34, 32, 30, 30, 28, 28, 23, 20, 19, 17, 17, 16, 12, 12, 9, 8, 7, 7, 5]
>>> columns, colsize, average, maxdeviation = first_fit(h)
>>> printit(columns, colsize, average, maxdeviation)
columns
[[46, 41, 40, 34, 30, 28, 19, 12, 12, 5],
[45, 42, 38, 36, 30, 28, 17, 16, 8, 7],
[44, 44, 37, 34, 32, 23, 20, 17, 9, 7]]
colsize: [267, 267, 267]
average, maxdeviation: 267.0 0.0
deviations: [0.0, 0.0, 0.0]
如果你有反向排序,这个附加在上面代码底部的额外代码(在'if 名称 == ...中)将对随机数据进行额外的试验:
for trial in range(2,11):
print('\n## Trial %i' % trial)
heights = [random.randint(5, 50) for i in range(random.randint(5, 50))]
print('Pictures:',len(heights))
columns, colsize, average, maxdeviation = first_fit(heights)
print('average %7.3f' % average, '\nmaxdeviation:')
print('%5.2f%% = %6.3f' % ((maxdeviation * 100. / average), maxdeviation))
swapcount = 0
while maxdeviation:
swapped, maxdeviation = swap1(columns, colsize, average, maxdeviation)
if not swapped:
break
print('%5.2f%% = %6.3f' % ((maxdeviation * 100. / average), maxdeviation))
swapcount += 1
print('swaps:', swapcount)
额外输出显示掉期的影响:
## Trial 2
Pictures: 11
average 72.000
maxdeviation:
9.72% = 7.000
swaps: 0
## Trial 3
Pictures: 14
average 118.667
maxdeviation:
6.46% = 7.667
4.78% = 5.667
3.09% = 3.667
0.56% = 0.667
swaps: 3
## Trial 4
Pictures: 46
average 470.333
maxdeviation:
0.57% = 2.667
0.35% = 1.667
0.14% = 0.667
swaps: 2
## Trial 5
Pictures: 40
average 388.667
maxdeviation:
0.43% = 1.667
0.17% = 0.667
swaps: 1
## Trial 6
Pictures: 5
average 44.000
maxdeviation:
4.55% = 2.000
swaps: 0
## Trial 7
Pictures: 30
average 295.000
maxdeviation:
0.34% = 1.000
swaps: 0
## Trial 8
Pictures: 43
average 413.000
maxdeviation:
0.97% = 4.000
0.73% = 3.000
0.48% = 2.000
swaps: 2
## Trial 9
Pictures: 33
average 342.000
maxdeviation:
0.29% = 1.000
swaps: 0
## Trial 10
Pictures: 26
average 233.333
maxdeviation:
2.29% = 5.333
1.86% = 4.333
1.43% = 3.333
1.00% = 2.333
0.57% = 1.333
swaps: 4
答案 4 :(得分:1)
调整背包问题解决算法',例如,指定每个桶的“权重”大致等于n个对象大小的平均值(尝试围绕平均值的高斯分布)。
答案 5 :(得分:1)
按大小顺序对存储桶进行排序。
将一个对象从最大的存储桶移动到最小的存储桶中,重新排序数组(这几乎是排序的,所以我们可以在两个方向上使用“有限的插入排序”;你也可以通过注意你在哪里加快速度放置最后两个要分拣的桶。如果你有6-6-6-6-6-6-5 ...并从第一个桶中取出一个物体,你将它移到第六个位置。然后在下一个迭代你可以从第五个开始比较。对于最小的桶,从右到左也是如此。
当两个桶的差异为1时,您可以停止。
这会移动最小数量的存储区,但是用于比较的顺序 n ^ 2 log n (最简单的版本是n ^ 3 log n)。如果对象移动是昂贵的,而桶大小检查不是,为了合理的n,它可能仍然会:
12 7 5 2
11 7 5 3
10 7 5 4
9 7 5 5
8 7 6 5
7 7 6 6
12 7 3 1
11 7 3 2
10 7 3 3
9 7 4 3
8 7 4 4
7 7 5 4
7 6 5 5
6 6 6 5
另一种可能性是计算每个铲斗的预期平均尺寸,并将一个行李箱(或另一个铲斗)“移动”,从较大的铲斗到较小的铲斗的剩余部分。
否则,可能会发生奇怪的事情:
12 7 3 1, the average is a bit less than 6, so we take 5 as the average.
5 7 3 1 bag = 7 from 1st bucket
5 5 3 1 bag = 9
5 5 5 1 bag = 7
5 5 5 8 which is a bit unbalanced.
通过采取6(即四舍五入)它会变得更好,但有时它会不起作用:
12 5 3 1
6 5 3 1 bag = 6 from 1st bucket
6 6 3 1 bag = 5
6 6 6 1 bag = 2
6 6 6 3 which again is unbalanced.
你可以运行两次传递,第一次使用舍入的平均值从左到右,另一次使用截断的平均值从右到左:
12 5 3 1 we want to get no more than 6 in each bucket
6 11 3 1
6 6 8 1
6 6 6 3
6 6 6 3 and now we want to get at least 5 in each bucket
6 6 4 5 (we have taken 2 from bucket #3 into bucket #5)
6 5 5 5 (when the difference is 1 we stop).
这将需要“n log n”大小检查,并且不超过2n个对象移动。
另一个有趣的可能性就是这样推理:你有m个对象进入n个桶。所以你需要将 m 的整数映射到 n ,这就是Bresenham的线性化算法。在排序的数组上运行(n,m)Bresenham,并且在步骤i(即,对于第i个桶),算法将告诉您是使用round(m / n)还是floor(m / n)大小。然后根据第i个桶尺寸将物体从“移动袋”移动到“移动袋”。
这需要 n log n 比较。
您可以通过最初将所有大小为圆(m / n)或大小(m / n)的存储桶移除到两个大小为R或F的存储桶池来进一步减少对象移动的数量。运行算法时,你需要第i个桶来保存R对象,如果R对象池不为空,则将第i个桶与其中一个R大小的桶交换。这样,只有无望的大小或超大的水桶才能得到平衡; (大多数)其他人都被忽略了,除了他们的引用被洗牌。
如果对象访问时间与计算时间成比例很大(例如某种自动装载器杂志),这将产生尽可能平衡的杂志,整体对象移动的绝对最小值。
答案 6 :(得分:0)
如果足够快,你可以使用整数编程包。
让你的约束正确可能很棘手。像下面这样的东西可以做到这一点:
让变量Oij
表示Object i
位于Bucket j
。让Wi
代表Oi
约束:
sum(Oij for all j) == 1 #each object is in only one bucket
Oij = 1 or 0. #object is either in bucket j or not in bucket j
sum(Oij * Wi for all i) <= X + R #restrict weight on buckets.
目标:
minimize X
注意R
是您可以使用的放松常数,具体取决于需要多少移动以及需要多少性能。
现在maximum bucket size
为X + R
下一步是确定可能的最小量移动,同时保持铲斗尺寸小于X + R
定义一个Stay变量Si
,用于控制Oi
是否保留在bucket j
如果Si
为0
,则表示Oi
停留在原来的位置。
约束:
Si = 1 or 0.
Oij = 1 or 0.
Oij <= Si where j != original bucket of Object i
Oij != Si where j == original bucket of Object i
Sum(Oij for all j) == 1
Sum(Oij for all i) <= X + R
目标:
minimize Sum(Si for all i)
此处Sum(Si for all i)
表示已移动的对象数。