优化递归代码以从输入数组生成有效数组

时间:2019-07-18 21:16:58

标签: python recursion optimization

我有一个递归Python函数,该函数从包含表示一周中不同日期的元素的不同“类型”的输入数组生成有效的输出数组,例如[m1, m2, m3, m4, t1, t2, t3, t4, w1, w2, w3, w4]

为了满足我的需求,我能够找出一个递归函数(来自另一个堆栈溢出程序的帮助),该函数可以获取输入数组并根据约束条件返回有效数组:

  1. 如果存在特定日期的元素,则必须至少有四个。
  2. 某天的元素必须是连续的。
  3. 总共必须有十二个元素

示例输入

[m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,t1,t2,t3,t4,w1,w2,w3,w4,f1,f2,f3,f4]

示例输出:

[m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12](可以是一种类型,因为其他类型不存在)

[m5,m6,m7,m8,m9,m10,m11,m12,t1,t2,t3,t4](或每种类型至少4个顺序)

[m4,m5,m6,m7,w1,w2,w3,w4,f1,f2,f3,f4](每种类型至少有4种,但可能会丢失)等。

无效:

[m4,m6,m5,m7,w1,w2,w3,w4,f1,f2,f3,f4](乱序)

[m4,m5,m6,m7,m8,w1,w2,w3,w4,f1,f2,f3](每种类型都不是4)

有效的代码:

import collections
import re

data =  ['f13', 'f14', 'f15', 'f16', 'f17', 'w0', 'w1', 'w2', 'w3', 't4', 't5', 't6', 't7', 't8', 't9', 'r4', 'r5', 'r6', 'r7', 'r8', 'r9', 'm0', 'm1', 'm2', 'm3']

def combo(d, c = []):
  if len(c) == 12:
     yield c
  else:
     for i in d:
        _count1 = collections.Counter([re.findall('^[a-zA-Z]+', j)[0] for j in c])
        _count2 = collections.Counter([re.findall('^[a-zA-Z]+', j)[0] for j in c+[i]])
        if i not in c:
           if len(c) < 11 or all(b >= 4 for b in _count2.values()):
              if re.findall('^[a-zA-Z]+', i)[0] in _count1:
                 if int(re.findall('\d+$', i)[0])-1 == int(re.findall('\d+$', c[-1])[0]) and re.findall('^[a-zA-Z]+', i)[0] == re.findall('^[a-zA-Z]+', c[-1])[0]:
                    yield from combo(d, c+[i])
              else:
                 yield from combo(d, c+[i])

result = combo(data)

print(next(result))

输出

 ['f13','f14','f15','f16','w0','w1','w2','w3','t4','t5','t6','t7']

此函数成功返回正确/有效的计划,但是要获得第一个成功的结果,需要299秒。有没有一种方法可以优化代码,或以某种方式处理输入数组以使这些结果可以更快地返回?谢谢

为澄清起见进行编辑:

我需要有一个函数(就像我现在拥有的那样)为输入生成所有可能的输出,该函数根据我的约束是有效的,最好以类似生成器的方式,所以我可以在需要时一次遍历它,以查看该组合在我的程序中是否有效。

例如,使用相同的输入

data =  ['f13', 'f14', 'f15', 'f16', 'f17', 'w0', 'w1', 'w2', 'w3', 't4', 't5', 't6', 't7', 't8', 't9', 'r4', 'r5', 'r6', 'r7', 'r8', 'r9', 'm0', 'm1', 'm2', 'm3']

我可能会有类似的输出


 ['f13','f14','f15','f16','w0','w1','w2','w3','t4','t5','t6','t7']

 ['f14','f15','f16','f17','w0','w1','w2','w3','t4','t5','t6','t7']

 ['f13','f14','f15','f16','r4','r5','r6','r7','t4','t5','t6','t7']

使用其他输入

data =  ['m0','m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8','m9','m10','m11','t0','t1', 't2', 't3', 't4', 't5']

我可能会有类似的输出


 ['m0','m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8','m9','m10','m11']

 ['m0','m1', 'm2', 'm3', 'm4', 'm5', 't0','t1', 't2', 't3', 't4','t5']

 ['m0','m1', 'm2', 'm3', 'm4', 'm5', 'm6','t1', 't2', 't3', 't4','t5']

注意:对于我的需求,以下输出将是等效的,但不必仅打印其中一个输出



 ['m0','m1', 'm2', 'm3', 'm4', 'm5', 't0','t1', 't2', 't3', 't4','t5']

 [ 't0','t1', 't2', 't3', 't4', 't5', 'm0','m1', 'm2', 'm3', 'm4', 'm5']

1 个答案:

答案 0 :(得分:1)

您可以尝试使用此代码。我对数据进行预处理(从字符串中提取数字并对其进行排序),使其在每次迭代中均不执行regex

import re
from itertools import groupby
from itertools import combinations

data =  ['m0','m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8','m9','m10','m11','t0','t1', 't2', 't3', 't4', 't5']

# returns eg.:
# {'f': [13, 14, 15, 16, 17], 'w': [0, 1, 2, 3], 't': [4, 5, 6, 7, 8, 9], 'r': [4, 5, 6, 7, 8, 9], 'm': [0, 1, 2, 3]}
def preprocess_data(data):
    out = {}
    for item in data:
        for k, v in re.findall(r'(\w)(\d+)', item):
            out.setdefault(k, []).append(int(v))
    for k in out:
        out[k].sort()
    return out

# 1. if an element from a specific day is present, there must be atleast 4 of them
# 2. the elements from a certain day must be sequential <- they are, because we preprocessed the data
# 3. must be 12 total elements
def check(data):
    rv = {}
    keys = set()
    for k, v in data.items():
        for vv, gg in groupby(enumerate(v), lambda k: k[0]-k[1]):
            consecutive_elements = [ii[1] for ii in gg]

            keys.add(k)
            for i in range(4, len(consecutive_elements) + 1):
                rv.setdefault(k, []).append(consecutive_elements[:i])

            break

    for k in [*rv.keys()]:
        rv[k].append([])

    for c in combinations([(k, i) for k, v in rv.items() for i in v], len(rv)):
        if any(len(i[1]) < 4 for i in c if len(i[1]) > 0):
            continue

        elements = [i[0] for i in c]
        if len(elements) != len(set(elements)):
            continue

        c2 = tuple(i[0] + str(ii) for i in c for ii in i[1])

        if len(c2) == 12:
            yield c2

def get_valid_combinations(data, dont_rotate=[], seen=set()):
    for c in check(data):
        if c not in seen:
            seen.add(c)
            yield c

    for k, v in data.items():
        if k in dont_rotate:
            continue
        for n in range(len(v)):
            data[k] = v[n:] + v[:n]
            yield from get_valid_combinations(data, dont_rotate + [k], seen)

for a in get_valid_combinations(preprocess_data(data)):
    print(a)

打印:

('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 't0', 't1', 't2', 't3', 't4', 't5')
('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 't0', 't1', 't2', 't3', 't4')
('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 't0', 't1', 't2', 't3')
('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 'm11')
('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 't1', 't2', 't3', 't4', 't5')
('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 't1', 't2', 't3', 't4')
('m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 't2', 't3', 't4', 't5')
('m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 't0', 't1', 't2', 't3', 't4')
('m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 't0', 't1', 't2', 't3')
('m1', 'm2', 'm3', 'm4', 'm5', 'm6', 't0', 't1', 't2', 't3', 't4', 't5')
('m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 't1', 't2', 't3', 't4', 't5')
('m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 't1', 't2', 't3', 't4')
('m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 't2', 't3', 't4', 't5')
('m2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 't0', 't1', 't2', 't3')
('m2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 't0', 't1', 't2', 't3', 't4')
('m2', 'm3', 'm4', 'm5', 'm6', 'm7', 't0', 't1', 't2', 't3', 't4', 't5')
('m2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 't1', 't2', 't3', 't4', 't5')
('m2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 't1', 't2', 't3', 't4')
('m2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 't2', 't3', 't4', 't5')
('m3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 't0', 't1', 't2', 't3')
('m3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 't0', 't1', 't2', 't3', 't4')
('m3', 'm4', 'm5', 'm6', 'm7', 'm8', 't0', 't1', 't2', 't3', 't4', 't5')
('m3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 't1', 't2', 't3', 't4', 't5')
('m3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 't1', 't2', 't3', 't4')
('m3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 't2', 't3', 't4', 't5')
('m4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 'm11', 't2', 't3', 't4', 't5')
('m4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 'm11', 't0', 't1', 't2', 't3')
('m4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 't0', 't1', 't2', 't3', 't4')
('m4', 'm5', 'm6', 'm7', 'm8', 'm9', 't0', 't1', 't2', 't3', 't4', 't5')
('m4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 't1', 't2', 't3', 't4', 't5')
('m4', 'm5', 'm6', 'm7', 'm8', 'm9', 'm10', 'm11', 't1', 't2', 't3', 't4')
('m5', 'm6', 'm7', 'm8', 'm9', 'm10', 'm11', 't1', 't2', 't3', 't4', 't5')
('m5', 'm6', 'm7', 'm8', 'm9', 'm10', 'm11', 't0', 't1', 't2', 't3', 't4')
('m5', 'm6', 'm7', 'm8', 'm9', 'm10', 't0', 't1', 't2', 't3', 't4', 't5')
('m6', 'm7', 'm8', 'm9', 'm10', 'm11', 't0', 't1', 't2', 't3', 't4', 't5')