根据权重分配的算法,项目总数未知,保证良好的分布?

时间:2012-08-06 16:24:16

标签: algorithm distribution probability

我想将令牌分配到3个插槽中。

每个插槽都有一些重量:可能50%的令牌应该进入第一个插槽,30%应该进入第二个插槽,20%进入第三个插槽。

我不知道令牌的总数 - 他们继续前进。我可能会在中午获得1000个令牌,下午1点再获得300个令牌。等等,不可预测。在任何时候,我到目前为止收到的代币应该根据权重分配。

一种解决方案是按概率分配。我为每个令牌滚动了一个100面的模具。如果结果为1-50,则令牌进入插槽1. 51-80的结果表示插槽2,结果为81-100表示​​插槽3。

但这意味着,例如,每个令牌最终都会在第3个插槽中结束,这并非不可能(只是不可能)。

我想保证当我收到总共100个令牌时,其中恰好有50个将在插槽1中。当我收到1000个令牌时,恰好500个应该在插槽1中。

对此有什么好的算法?

2 个答案:

答案 0 :(得分:1)

根据理想分布计算每个槽中的误差。始终将令牌插入到错误最多的插槽中。如果绑定了两个或多个插槽,则插入其中的随机插槽。

错误是预期的令牌数(添加的令牌*比率)与实际令牌数之间的差异。

通过这种方式,您始终可以最大限度地减少错误,并且如果令牌能够准确分发,则不会出现错误。

演示代码(如果存在相同数量的错误而不是随机分发,则插入第一个插槽):

import random

tokens_in_slots = [0, 0, 0]
slot_distributions = [0.5, 0.3, 0.2]

def add_token():
    num_tokens = sum(tokens_in_slots)
    if not num_tokens:
        #first token can go anywhere
        tokens_in_slots[random.randint(0,2)] += 1
        return
    expected_tokens = [num_tokens*distr for distr in slot_distributions]
    errors = [expected - actual
              for expected, actual in zip(expected_tokens, tokens_in_slots)]
    most_error = max(enumerate(errors), key=lambda (i,e): e)
    tokens_in_slots[most_error[0]] += 1

def add_and_print(n):
    for i in xrange(n):
        add_token()
        print sum(tokens_in_slots), tokens_in_slots

结果:

>>> add_and_print(100)
1 [0, 0, 1]
2 [1, 0, 1]
3 [1, 1, 1]
4 [2, 1, 1]
5 [2, 2, 1]
6 [3, 2, 1]
7 [3, 2, 2]
8 [4, 2, 2]
9 [4, 3, 2]
10 [5, 3, 2]
11 [6, 3, 2]
12 [6, 4, 2]
13 [6, 4, 3]
14 [7, 4, 3]
15 [7, 5, 3]
16 [8, 5, 3]
17 [8, 5, 4]
18 [9, 5, 4]
19 [9, 6, 4]
20 [10, 6, 4]
21 [11, 6, 4]
22 [11, 7, 4]
23 [11, 7, 5]
24 [12, 7, 5]
25 [12, 8, 5]
26 [13, 8, 5]
27 [13, 8, 6]
28 [14, 8, 6]
29 [14, 9, 6]
30 [15, 9, 6]
31 [16, 9, 6]
32 [16, 10, 6]
33 [16, 10, 7]
34 [17, 10, 7]
35 [17, 11, 7]
36 [18, 11, 7]
37 [18, 11, 8]
38 [19, 11, 8]
39 [19, 12, 8]
40 [20, 12, 8]
41 [21, 12, 8]
42 [21, 13, 8]
43 [21, 13, 9]
44 [22, 13, 9]
45 [22, 14, 9]
46 [23, 14, 9]
47 [23, 14, 10]
48 [24, 14, 10]
49 [24, 15, 10]
50 [25, 15, 10]
51 [26, 15, 10]
52 [26, 16, 10]
53 [26, 16, 11]
54 [27, 16, 11]
55 [27, 17, 11]
56 [28, 17, 11]
57 [28, 17, 12]
58 [29, 17, 12]
59 [29, 18, 12]
60 [30, 18, 12]
61 [31, 18, 12]
62 [31, 19, 12]
63 [31, 19, 13]
64 [32, 19, 13]
65 [32, 20, 13]
66 [33, 20, 13]
67 [33, 20, 14]
68 [34, 20, 14]
69 [34, 21, 14]
70 [35, 21, 14]
71 [36, 21, 14]
72 [36, 22, 14]
73 [36, 22, 15]
74 [37, 22, 15]
75 [37, 23, 15]
76 [38, 23, 15]
77 [38, 23, 16]
78 [39, 23, 16]
79 [39, 24, 16]
80 [40, 24, 16]
81 [41, 24, 16]
82 [41, 25, 16]
83 [41, 25, 17]
84 [42, 25, 17]
85 [42, 26, 17]
86 [43, 26, 17]
87 [43, 26, 18]
88 [44, 26, 18]
89 [44, 27, 18]
90 [45, 27, 18]
91 [46, 27, 18]
92 [46, 28, 18]
93 [46, 28, 19]
94 [47, 28, 19]
95 [47, 29, 19]
96 [48, 29, 19]
97 [48, 29, 20]
98 [49, 29, 20]
99 [49, 30, 20]
100 [50, 30, 20]

的结果
tokens_in_slots = [0, 0, 0, 0]
slot_distributions = [0.8, 0.1, 0.05, 0.05]

>>> add_and_print(100)
1 [0, 0, 1, 0]
2 [1, 0, 1, 0]
3 [2, 0, 1, 0]
4 [3, 0, 1, 0]
5 [3, 1, 1, 0]
6 [4, 1, 1, 0]
7 [5, 1, 1, 0]
8 [6, 1, 1, 0]
9 [7, 1, 1, 0]
10 [7, 1, 1, 1]
11 [8, 1, 1, 1]
12 [9, 1, 1, 1]
13 [10, 1, 1, 1]
14 [11, 1, 1, 1]
15 [11, 2, 1, 1]
16 [12, 2, 1, 1]
17 [13, 2, 1, 1]
18 [14, 2, 1, 1]
19 [15, 2, 1, 1]
20 [16, 2, 1, 1]
21 [17, 2, 1, 1]
22 [17, 3, 1, 1]
23 [18, 3, 1, 1]
24 [19, 3, 1, 1]
25 [20, 3, 1, 1]
26 [20, 3, 2, 1]
27 [21, 3, 2, 1]
28 [22, 3, 2, 1]
29 [23, 3, 2, 1]
30 [23, 3, 2, 2]
31 [24, 3, 2, 2]
32 [25, 3, 2, 2]
33 [26, 3, 2, 2]
34 [27, 3, 2, 2]
35 [27, 4, 2, 2]
36 [28, 4, 2, 2]
37 [29, 4, 2, 2]
38 [30, 4, 2, 2]
39 [31, 4, 2, 2]
40 [32, 4, 2, 2]
41 [33, 4, 2, 2]
42 [33, 5, 2, 2]
43 [34, 5, 2, 2]
44 [35, 5, 2, 2]
45 [36, 5, 2, 2]
46 [36, 5, 3, 2]
47 [37, 5, 3, 2]
48 [38, 5, 3, 2]
49 [39, 5, 3, 2]
50 [39, 5, 3, 3]
51 [40, 5, 3, 3]
52 [41, 5, 3, 3]
53 [42, 5, 3, 3]
54 [43, 5, 3, 3]
55 [43, 6, 3, 3]
56 [44, 6, 3, 3]
57 [45, 6, 3, 3]
58 [46, 6, 3, 3]
59 [47, 6, 3, 3]
60 [48, 6, 3, 3]
61 [49, 6, 3, 3]
62 [49, 7, 3, 3]
63 [50, 7, 3, 3]
64 [51, 7, 3, 3]
65 [52, 7, 3, 3]
66 [52, 7, 4, 3]
67 [53, 7, 4, 3]
68 [54, 7, 4, 3]
69 [55, 7, 4, 3]
70 [55, 7, 4, 4]
71 [56, 7, 4, 4]
72 [57, 7, 4, 4]
73 [58, 7, 4, 4]
74 [59, 7, 4, 4]
75 [59, 8, 4, 4]
76 [60, 8, 4, 4]
77 [61, 8, 4, 4]
78 [62, 8, 4, 4]
79 [63, 8, 4, 4]
80 [64, 8, 4, 4]
81 [65, 8, 4, 4]
82 [65, 9, 4, 4]
83 [66, 9, 4, 4]
84 [67, 9, 4, 4]
85 [68, 9, 4, 4]
86 [68, 9, 5, 4]
87 [69, 9, 5, 4]
88 [70, 9, 5, 4]
89 [71, 9, 5, 4]
90 [71, 9, 5, 5]
91 [72, 9, 5, 5]
92 [73, 9, 5, 5]
93 [74, 9, 5, 5]
94 [75, 9, 5, 5]
95 [75, 10, 5, 5]
96 [76, 10, 5, 5]
97 [77, 10, 5, 5]
98 [78, 10, 5, 5]
99 [79, 10, 5, 5]
100 [80, 10, 5, 5]

答案 1 :(得分:0)

我想到的解决方案:

为每个位置计算得分。将令牌放入分数最高的插槽中。如果得分不止一个,我不在乎我们选择第一个还是随机选择。

计算得分的计算方法类似于以下Ruby /伪代码:

# Example values
# Floats to avoid integer division
slot_1_weight = 50.0
total_weight  = 100.0
slot_1_tokens = 2.0
total_tokens  = 3.0

if total_tokens == 0 || total_weight == 0 || slot_1_tokens
  # Avoid division by zero.
  slot_1_score = slot_1_weight
else
  expected_distribution = slot_1_weight/total_weight
  actual_distribution = slot_1_tokens/total_tokens
  slot_1_score  = slot_1_weight * (expected_distribution/actual_distribution)
end

因此,当预期和实际匹配时,分数是原始权重。如果预期太高,则重量会缩小。如果预期太低,则重量会增加。