找到可能的唯一固定长度排列数的最有效方法是什么?

时间:2016-10-09 19:50:28

标签: python algorithm

我有这本词典:

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)],
    }

我需要找到每个元组的第一个值的3个长组合的最大数量,其中只有每个键的值可以进行所述键。

我目前寻找所有独特(3个长)组合的代码是:

ans_set = set()
for x in num_dict:
    for y in num_dict[x]:
        for z in num_dict[y]:
            ans_set.add((x[0], y[0], z[0]))
return len(ans_set)

返回 10ans_set最终成为:

{
 (2, 2, 2), (1, 2, 2), (1, 4, 4),
 (2, 2, 4), (1, 1, 2), (4, 4, 4),
 (1, 2, 4), (1, 1, 4), (1, 1, 1),
 (2, 4, 4)
}

但我实际上并不关心这些是什么,只是它们的数量

此方法效率不高,因为它实际上会生成所有可能的组合并将其放入集合中。

我不需要知道每个独特的组合,我只需知道有多少组合。

我觉得可以这样做,也许使用值列表的长度?但是我无法绕过它。

澄清有关我需要的问题是值得欢迎的,因为我意识到我可能没有以最明确的方式解释它。

最终编辑

通过重新评估我需要它做什么,我找到了找到三元组数量的最佳方法。这种方法实际上并没有找到三元组,它只计算它们。

def foo(l):
    llen = len(l)
    total = 0
    cache = {}
    for i in range(llen):
        cache[i] = 0
    for x in range(llen):
        for y in range(x + 1, llen):
            if l[y] % l[x] == 0:
                cache[y] += 1
                total += cache[x]
    return total

这是一个功能版本,可以解释思考过程(虽然因为垃圾邮件打印而不适合大型列表):

def bar(l):
    list_length = len(l)
    total_triples = 0
    cache = {}
    for i in range(list_length):
        cache[i] = 0
    for x in range(list_length):
        print("\n\nfor index[{}]: {}".format(x, l[x]))
        for y in range(x + 1, list_length):
            print("\n\ttry index[{}]: {}".format(y, l[y]))
            if l[y] % l[x] == 0:
                print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
                cache[y] += 1
                total_triples += cache[x]
                print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
                print("\t\tcount is now {}".format(total_triples))
                print("\t\t(+{} from cache[{}])".format(cache[x], x))
            else:
                print("\n\t\tfalse")
    print("\ntotal number of triples:", total_triples)

1 个答案:

答案 0 :(得分:1)

如果我找对你:

from itertools import combinations

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
    }
set(combinations([k[0] for k in num_dict.keys()], 3))

输出:

{(1, 4, 1),
 (2, 1, 1),
 (2, 1, 4),
 (2, 2, 1),
 (2, 2, 4),
 (2, 4, 1),
 (2, 4, 4),
 (4, 1, 1),
 (4, 1, 4),
 (4, 4, 1)}

len()10

基本上你会做什么,用长度为3的dict键的第一个元素与itertools.combinations进行所有组合,然后得到set以消除重复元素。

<强>更新

由于您使用所需的输出数据更新了问题

您可以执行以下操作

from itertools import combinations_with_replacement
list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3))

输出:

[(1, 1, 1),
 (1, 1, 2),
 (1, 1, 4),
 (1, 2, 2),
 (1, 2, 4),
 (1, 4, 4),
 (2, 2, 2),
 (2, 2, 4),
 (2, 4, 4),
 (4, 4, 4)]

<强> UPD2

关于时间消耗,我已经运行了

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
    }
def a(num_dict):
    ans_set = set()
    for x in num_dict:
        for y in num_dict[x]:
            for z in num_dict[y]:
                ans_set.add((x[0], y[0], z[0]))
    return len(ans_set)
def b(num_dict):
    from itertools import combinations_with_replacement
    return len(list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3)))
%timeit a(num_dict)
%timeit b(num_dict)

结果是:

The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 12.1 µs per loop

The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.77 µs per loop

我在这里提出的解决方案速度提高了2倍。