Question

我需要创建一个项目组列表，这些项目组被分组，以便概率的负对数之和大致为1.

到目前为止，我已经提出了

probs = np.random.dirichlet(np.ones(50)*100.,size=1).tolist()
logs = [-1 * math.log(1-x,2) for x in probs[0]]
zipped = zip(range(0,50), logs)

for key, igroup in iter.groupby(zipped, lambda x: x[1] < 1):
    print(list(igroup))

即。我创建一个随机数列表，取其负对数，然后将这些概率与项目编号一起压缩。

然后我想通过将元组的第二列中的数字加在一起来创建组，直到总和为1（或略高于它）。

我试过了：

for key, igroup in iter.groupby(zipped, lambda x: x[1]):
    for thing in igroup:
        print(list(iter.takewhile(lambda x: x < 1, iter.accumulate(igroup))))

以及使用itertools.accmuluate的各种其他变体，但我无法让它工作。

有没有人知道会出现什么问题（我认为我做的工作太多了）。

理想情况下，输出应该类似于

groups = [[1,2,3], [4,5], [6,7,8,9]]

等，这些是满足这一特性的群体。

Answer 1

使用numpy.ufunc.accumulate和简单循环：

import numpy as np

def group(xs, start=1):
    last_sum = 0
    for stop, acc in enumerate(np.add.accumulate(xs), start):
        if acc - last_sum >= 1:
            yield list(range(start, stop))
            last_sum = acc
            start = stop
    if start < stop:
        yield list(range(start, stop))

probs = np.random.dirichlet(np.ones(50) * 100, size=1)
logs = -np.log2(1 - probs[0])
print(list(group(logs)))

示例输出：

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]]

<强> ALTERNATIVE

使用numpy.searchsorted：

def group(xs, idx_start=1):
    xs = np.add.accumulate(xs)
    idxs = np.searchsorted(xs, np.arange(xs[-1]) + 1, side='left').tolist()
    return [list(range(i+idx_start, j+idx_start)) for i, j in zip([0] + idxs, idxs)]

对元组列进行分组，使其总和小于1

1 个答案: