Question

我正在寻找一种在两个不同列表中找到核心模式的有效方法，我将解释：

清单1：

[10318, 6032,1518, 4061, 4380, 73160, 83607, 9202, 28812, 40359, 28457, 
 3292, 2678, 8492, 7149, 19417, 7372, 8534, 3889, 11123, 8415, 5989]

清单2：

[5760, 1541, 2085, 637,1518, 4061, 4380, 73160, 83607, 9202, 28812, 40359, 
 28457, 3292, 2678, 8492, 7149, 19417, 7372, 8534, 3889, 11123]

这两个列表可能有300多个元素，每个列表中的相似元素每次都非常大（可能超过60％）

我的目标，找到＆＃34;核心＆＃34;从每个列表开始。每5分钟会有一个新列表，并将与之前的列表进行比较。我感兴趣的是不是核心的部分。换句话说，我需要检索列表的开头直到上一个列表的核心（已识别）。

效率是每5分钟的关键新列表，但数百个并行处理。

任何算法或数学方式或解决方案都有助于：）

我希望我的要求是准确的

Answer 1

这将为您的样品做到这一点，没有用大清单尝试。中间的powersets变得巨大，所以可能不是正确的选择：

from itertools import chain,product,islice

l1 = [10318, 6032,1518, 4061, 4380, 73160, 83607, 9202, 28812, 40359, 28457, 
 3292, 2678, 8492, 7149, 19417, 7372, 8534, 3889, 11123, 8415, 5989]

l2 = [5760, 1541, 2085, 637,1518, 4061, 4380, 73160, 83607, 9202, 28812, 40359, 
 28457, 3292, 2678, 8492, 7149, 19417, 7372, 8534, 3889, 11123]

# not really a receipt - but inspired by partition and powerset
# from https://docs.python.org/3/library/itertools.html#itertools-recipes
def powerskiptakeset(iterab): 
    """Creates non-empty partitions of a given iterable in existing order 
       from len(1) to len(iterab). 

    Example: 
        [1,2,3,4] --> {(1,), (2,), (3,), (4,), (1, 2), (2, 3), (3, 4),
                       (1, 2, 3), (2, 3, 4),  (1, 2, 3, 4)}"""
    s = list(iterab)
    return set(chain.from_iterable([tuple(islice(s, start, stop))] for 
                               start,stop in product(range(len(s)+1),range(len(s)+1)) 
                               if start < stop))


l1_set = powerskiptakeset(l1)   
l2_set = powerskiptakeset(l2)

core = max( l1_set& l2_set, key=lambda coll: len(coll))

print(list(core))

输出：

[1518, 4061, 4380, 73160, 83607, 9202, 28812, 40359, 28457, 3292, 
 2678, 8492, 7149, 19417, 7372, 8534, 3889, 11123]

对于rage(300)，结果集包含45150个元素。你可以通过f.e调整它。将powerskiptakeset重新调整为最小长度的25％输入可迭代长度：

from itertools import chain,product,islice

def powerskiptakeset_25perc(iterab): 
    """Creates non-empty partitions of a given iterable in order of len(iterab)//4 to len(iterab)

    [1,2,3,4] --> set([(1, 2), (1, 2, 3, 4), (1,), (2,), (3,), (1, 2, 3), (2, 3), (2, 3, 4), (4,), (3, 4)])"""

    s = list(iterab)
    return set(chain.from_iterable([tuple(islice(s, start, stop))] for 
                               start,stop in product(range(len(s)+1),range(len(s)+1)) 
                               if start < stop and stop-start >= len(iterab)//4))

print(len(powerskiptakeset_25perc(range(300))))

将设置中的元组数量减少到大约25k。

Python 3.6 - 在两个列表中找到核心模式

1 个答案: