成长多维数据集的想法

Question

我有以下递归生成器，它产生从0到top-1的每个数字组合：

def f(width, top):
  if width == 0:
    yield []
  else:
    for v in range(top):
      for subResult in f(width - 1, top):
        yield [ v ] + subResult

如果调用f(3, 3)，则会产生值

[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 1, 0], [0, 1, 1], [0, 1, 2],
[0, 2, 0], [0, 2, 1], [0, 2, 2], [1, 0, 0], [1, 0, 1], [1, 0, 2],
[1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 0], [1, 2, 1], [1, 2, 2],
[2, 0, 0], [2, 0, 1], [2, 0, 2], [2, 1, 0], [2, 1, 1], [2, 1, 2],
[2, 2, 0], [2, 2, 1], [2, 2, 2]

（尝试将其称为list(f(3,3))以将其作为列表。）

我需要得到的是不同顺序中的相同值：我希望按其最大值排序的值，i。即首先是值[0, 0, 0]，然后是1作为最大值的所有值，i。即[0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], ...，然后包含2，i。即[0, 0, 2], [0, 1, 2], [0, 2, 0], [0, 2, 1], [0, 2, 2], [2, 0, 0], ...等。

生成器的值不会超过两次（当然），并且必须能够使用非常大的值（如f(4, 1000)）来调用它，然后才能完全耗尽它（所以先生成所有值，然后再将它们排序他们的最大值是不可能的。）

我能想到的唯一方法是首先生成f(w, 0)的所有值，然后生成f(w, 1)，然后生成f(w, 2)并始终跳过之前已经生成的值，但是我有一种唠叨的感觉，他们可能是一个更好的方法：

def g(width, top):
  for t in range(top):
    for v in f(width, t+1):
      if t in v:
        yield v

有什么想法吗？

Answer 1

def h(width,top,top_count):
    """
    Producing lists of length 'width' containing numbers from 0 to top-1.
    Where top-1 only occur exactly top_count times.
    """
    if width == 0:
        yield []
    elif width == top_count:
        yield [top-1]*top_count
    else:
        for x in range(top-1):
            for result in h(width-1,top,top_count):
                yield [x]+result
        if top_count > 0:
            for result in h(width-1,top,top_count-1):
                yield [top-1]+result


def m(width,top):
    yield [0]*width
    for current_top in range(2,top+1):
        for top_count in range(1,width+1):
            print "=== h{}".format((width,current_top,top_count))
            for result in h(width,current_top,top_count):
                print result
                yield result

ans = [x for x in m(3,3)]

结果：

=== h(3, 2, 1)
[0, 0, 1]
[0, 1, 0]
[1, 0, 0]
=== h(3, 2, 2)
[0, 1, 1]
[1, 0, 1]
[1, 1, 0]
=== h(3, 2, 3)
[1, 1, 1]
=== h(3, 3, 1)
[0, 0, 2]
[0, 1, 2]
[0, 2, 0]
[0, 2, 1]
[1, 0, 2]
[1, 1, 2]
[1, 2, 0]
[1, 2, 1]
[2, 0, 0]
[2, 0, 1]
[2, 1, 0]
[2, 1, 1]
=== h(3, 3, 2)
[0, 2, 2]
[1, 2, 2]
[2, 0, 2]
[2, 1, 2]
[2, 2, 0]
[2, 2, 1]
=== h(3, 3, 3)
[2, 2, 2]

添加了打印语句以显示对函数h的每次调用及其结果。关于h函数的评论应该足够清楚，以解释一般的想法。

Answer 2

我自己找到了解决方案。我首先遍历顶部值，然后生成具有一个或多个此顶值的所有值。为此我循环了顶部值的数量（1到宽度）。对于每个这样的量，我循环遍历所有这些最高值可以具有的位置组合。然后我用最高值填充这些位置，剩下的值用所有值低于最高值的普通产品填充。

代码如下：

from itertools import product, combinations

def h(width, top):
  for t in range(top):
    for topAmount in range(1, width+1):  # how many top values are present?
      for topPositions in combinations(range(width), topAmount):
        for fillers in product(
            *[ range(t) for x in range(width-len(topPositions)) ]):
          fillers = list(fillers)
          yield [ t if i in topPositions else fillers.pop()
              for i in range(width) ]

但我仍然想邀请您提出更优雅的解决方案。在我看来，这似乎是一种蛮力方法，而我建立价值的方式当然不是我所见过的最便宜的。

Answer 3

成长多维数据集的想法

（更新自＆＃34;对角线＆＃34;想法）

当我在纸上画任务时，我得到了类似的东西：

 |0|1|2|3|
-|-|-|-|-|
0|a|b|c|d|
-|-|-|-|-|
1|b|b|c|d|
-|-|-|-|-|
2|c|c|c|d|
-|-|-|-|-|
3|d|d|d|d|
-|-|-|-|-|

它只显示2-D，实际上它的数量与数字一样多。

信件a，b，c，d显示您希望获得组合的群组。

我想说的是，这些群体正在塑造一个n维生长立方体角落的表面。

所有组合均由此立方体中所有点的坐标表示（包括内部空间）。请注意，我们的坐标使用离散值（0,1,2 ..），因此存在有限数或它们。

如果您找到扫描该生长立方体表面上所有坐标的规则，您将获得所要求的生成器。

Answer 4

我很确定您的函数f产生与itertools.product相同的值;即。我认为您可以将f替换为：

from itertools import product

def f(width, top):
    for p in product(range(top), repeat=width):
        yield list(p)

要按照问题中的说明订购这些值，您只需使用itertools.groupby：

即可

from itertools import groupby
from collections import defaultdict

def group_by_max_value(x, y):
    grouped = defaultdict(list)
    for k, g in groupby(f(x, y), key=max):
        grouped[k].extend(list(g))
    return [grouped[k] for k in sorted(grouped.keys())]

修改后的函数定义，它可以生成排序值而无需先生成整个序列。

from itertools import groupby
from collections import defaultdict

def lazy_group_by_max_value(width, top):
    grouped = defaultdict(list)
    # using `itertools.product` with a `range` object
    # guarantees that the product-tuples are emitted
    # in sorted order.
    ps = product(range(top), repeat=width)
    for k, g in groupby(ps, key=max):
        xs = list(g)
        grouped[k].extend(xs)
        # if xs[-1] is of the form (0, 0, .., 0), (1, 1, .., 1), .., (n, n, .., n) etc
        # then we have found all the maxes for `k`, because all future
        # sequences will contain at least one value which is greater than k.
        if set(xs[-1]) == {k}:
            # `pop` (ie. remove) the values from `grouped`
            # which are associated with key `k`.
            all_maxes_for_k = grouped.pop(k)
            for coll in all_maxes_for_k:
                yield coll

Answer 5

这是一个生成下一个词典排列的算法（顺便说一下，我也喜欢将每个集合作为具有不同基数的数字的想法;例如，基数1基数2等）：

虽然并非所有数字都被最大化
增加最左边最大值右边的所有数字以下算法：
增加未最大化的最右边的数字并设置所有数字它的右边为零
如果它们最大化，则向左增加第一个数字。如果它已最大化，请将所有数字设置为
权利归零;否则，将最右边的数字设置为最大值，将之间的数字设置为零。

Python代码：

def nextP(perm,top):
  if all (i == top for i in perm):
    return None

  left_max = perm.index(top)

  if all (i == top for i in perm[left_max:]):
    perm[left_max - 1] = perm[left_max - 1] + 1
    perm[left_max:] = [0] * (len(perm) - left_max - 1) + ([0] if perm[left_max - 1] == top else [top])
  else:
    right_max = len(perm) - next(x[0] for x in enumerate(perm[left_max + 1:][::-1]) if x[1] < top) - 1
    perm = perm[:right_max] + [perm[right_max] + 1] + [0] * (len(perm) - right_max - 1)

  return perm

示例：

permutation = [0,0,2]

while permutation:
  print permutation
  permutation = nextP(permutation,2)

[0, 0, 2]
[0, 1, 2]
[0, 2, 0]
[0, 2, 1]
[0, 2, 2]
[1, 0, 2]
[1, 1, 2]
[1, 2, 0]
[1, 2, 1]
[1, 2, 2]
[2, 0, 0]
[2, 0, 1]
[2, 0, 2]
[2, 1, 0]
[2, 1, 1]
[2, 1, 2]
[2, 2, 0]
[2, 2, 1]
[2, 2, 2]

Answer 6

首先请注意，您可以使用包含2的唯一解决方案列表，轻松生成包含1的唯一解决方案列表。只需增加1的所有可能组合。例如，从[1,0,1]开始，您只需生成[2,0,1]，[1,0,2]和[2,0,2]。这表明了以下解决方案：

import itertools

def g(n) :
    if n == 0 :
        yield [ 0,0,0 ]
    else :
        for x in g(n-1) : # for each solution containing `1` as the maximum
            idx = [ i for (i,xi) in enumerate(x) if xi == n-1 ] # locate the '1' to be incremented
            for j in xrange(1,len(idx)+1) : # increment one '1', then two '1', then three '1', etc
                for tup in itertools.combinations( idx, j ) : # all possible combinations of j '1'
                    y = list(x)
                    for t in tup : # prepare the new solution
                        y[t] += 1
                    yield y

示例：

list( g(0) )

[[0, 0, 0]]

list( g(1) )

[[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 1, 0], [1, 0, 1], [0, 1, 1], [1, 1, 1]]

list( g(2) )

[[2, 0, 0],
 [0, 2, 0],
 [0, 0, 2],
 [2, 1, 0],
 [1, 2, 0],
 [2, 2, 0],
 [2, 0, 1],
 [1, 0, 2],
 [2, 0, 2],
 [0, 2, 1],
 [0, 1, 2],
 [0, 2, 2],
 [2, 1, 1],
 [1, 2, 1],
 [1, 1, 2],
 [2, 2, 1],
 [2, 1, 2],
 [1, 2, 2],
 [2, 2, 2]]

特殊订单组合的发电机

6 个答案:

成长多维数据集的想法