我正在使用“星条”算法从多个列表中选择项目,条k和k + 1之间的星数是第k个列表中的索引。我面临的问题是分区(即两个小节之间的星星数)可能大于列表的大小,这将导致许多无效的组合。
例如:如果我有两个长度分别为8的列表,则(14,0)
是有效的star分布,其sum = 14,但是当然会超过第一个列表的容量。 (7,7)
是最高的有效索引-因此,我得到了大量无效索引,尤其是在列表大小不相等的情况下。
出于性能原因,我需要分区大小有限的算法变体。我怎样才能做到这一点?我现在正在使用的star-bars实现是this one,但我可以轻松对其进行更改。 列表通常具有相似的长度,但不一定具有相同的长度。将分区大小限制为最长列表的长度是可以的,但是单独的限制当然会更好。
import itertools
def stars_and_bars(stars, partitions):
for c in itertools.combinations(range(stars+partitions-1), partitions-1):
yield tuple(right-left-1 for left,right in zip((-1,) + c, c + (stars+partitions-1,)))
def get_items(*args):
hits = 0
misses = 0
tries = 0
max_idx = sum(len(a) - 1 for a in args)
for dist in range(max_idx):
for indices in stars_and_bars(dist, len(args)):
try:
tries += 1
[arg[i] for arg,i in zip(args,indices)]
hits += 1
except IndexError:
misses += 1
continue
print('hits/misses/tries: {}/{}/{}'.format(hits, misses, tries))
# Generate 4 lists of length 1..4
lists = [[None]*(r+1) for r in range(4)]
get_items(*lists)
# hits/misses/tries: 23/103/126
编辑:我在mathexchange上发现了两个相关的问题,但是我还不能将它们翻译成代码:
答案 0 :(得分:1)
基于this post,这里有一些代码可以有效地生成解决方案。与其他文章的主要区别在于,现在存储桶具有不同的限制,并且存储桶的数量是固定的,因此解决方案的数量不是无限的。
def find_partitions(x, lims):
# partition the number x in a list of buckets;
# the number of elements of each bucket i is strictly smaller than lims[i];
# the sum of all buckets is x;
# output the lists of buckets one by one
a = [x] + [0 for l in lims[1:]] # create an output array of the same lenghth as lims, set a[0] to x
while True:
# step 1: while a[i] is too large: redistribute to a[i+1]
i = 0
while a[i] >= lims[i] and i < len(lims) - 1:
a[i + 1] += a[i] - (lims[i] - 1)
a[i] = (lims[i] - 1)
i += 1
if a[-1] >= lims[-1]:
return # the last bucket has too many elements: we've reached the last partition;
# this only happens when x is too large
yield a
# step 2: add one to group 1;
# while a group i is already full: set to 0 and increment group i+1;
# while the surplus is too large (because a[0] is too small): repeat incrementing
i0 = 1
surplus = 0
while True:
for i in range(i0, len(lims)): # increment a[i] by 1, which can carry to the left
if a[i] < lims[i]-1:
a[i] += 1
surplus += 1
break
else: # a[i] would become too full if 1 were added, therefore clear a[i] and increment a[i+1]
surplus -= a[i]
a[i] = 0
else: # the for-loop didn't find a small enough a[i]
return
if a[0] >= surplus: # if a[0] is large enough to absorb the surplus, this step is done
break
else: # a[0] would get negative to when absorbing the surplus, set a[i0] to 0 and start incrementing a[i0+1]
surplus -= a[i0]
a[i0] = 0
i0 += 1
if i0 == len(lims):
return
# step 3: a[0] should absorb the surplus created in step 2, although a[0] can get be too large
a[0] -= surplus
x = 11
lims = [5, 4, 3, 5]
for i, p in enumerate(find_partitions(x, lims)):
print(f"partition {i+1}: {p} sums to {sum(p)} lex: { ''.join([str(i) for i in p[::-1]]) }")
0<=a[0]<5
,0<=a[1]<4
,0<a[2]<3
,0<a[3]<5
,a[0]+a[1]+a[2]+a[3] == 11
的19个解决方案(从右到左书写,它们的词汇顺序递增) :
[4, 3, 2, 1]
[4, 3, 1, 2]
[4, 2, 2, 2]
[3, 3, 2, 2]
[4, 3, 0, 3]
[4, 2, 1, 3]
[3, 3, 1, 3]
[4, 1, 2, 3]
[3, 2, 2, 3]
[2, 3, 2, 3]
[4, 2, 0, 4]
[3, 3, 0, 4]
[4, 1, 1, 4]
[3, 2, 1, 4]
[2, 3, 1, 4]
[4, 0, 2, 4]
[3, 1, 2, 4]
[2, 2, 2, 4]
[1, 3, 2, 4]
在测试代码中,您可以将for indices in stars_and_bars(dist, len(args)):
替换为for indices in find_partitions(dist, limits):
,其中limits = [len(a) for a in args]
。然后您将得到hits/misses/tries: 23/0/23
。要获得全部24个解决方案,dist
的for循环还应允许最后一个:for dist in range(max_idx+1):
PS:如果只希望列表中元素的所有可能组合,而又不关心首先获得最小的索引,则itertools.product
会生成它们:
lists = [['a'], ['b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i', 'j']]
for i, p in enumerate(itertools.product(*lists)):
print(i+1, p)