Question

我正在尝试通过使用开始和结束模式从Python2.7中从一个更大的整数列表中提取列表/子列表。我想用一个函数来做，但是我找不到解决这个问题的库，算法或正则表达式。

def myFunctionForSublists(data, startSequence, endSequence):
    # ... todo

data = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]

startSequence = [1,2,3]
endSequence = [4,5,6]

sublists = myFunctionForSublists(data, startSequence, endSequence)

print sublists[0] # [1, 2, 3, 99, 99, 99, 4, 5, 6]
print sublists[1] # [1, 2, 3, 99, 4, 5, 6]

有什么想法可以实现吗？

Answer 1

这是一种更通用的解决方案，不需要将列表切成薄片，因此您可以在其他可迭代对象（如生成器）上使用它。

我们保持deque序列大小的start直到遇到它。然后，我们将这些值添加到列表中，并继续遍历序列。在执行操作时，我们将保留deque大小作为结束序列，直到看到它为止，并将元素添加到要保留的列表中。如果遇到结束序列，我们将yield列出并设置deque以扫描下一个开始序列。

from collections import deque

def gen(l, start, stop):
    start_deque = deque(start)
    end_deque = deque(stop)
    curr_deque = deque(maxlen=len(start))
    it = iter(l)
    for c in it:
        curr_deque.append(c)
        if curr_deque == start_deque:
            potential = list(curr_deque)
            curr_deque = deque(maxlen=len(stop))
            for c in it:
                potential.append(c)
                curr_deque.append(c)
                if curr_deque == end_deque:
                    yield potential
                    curr_deque = deque(maxlen=len(start))
                    break

print(list(gen([99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99], [1,2,3], [4,5,6])))

# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]

Answer 2

这是一个itertools方法，它使用有限长度的collections.deque来保留适当大小的最后一个元素的缓冲区。它假定您的子列表不重叠，并且起始序列和结束序列也不重叠。

它适用于数据，开始，结束（甚至是生成器）的任何顺序。

from collections import deque
from itertools import islice

def sublists(data, start, end):
    it = iter(data)
    start, end = deque(start), deque(end)
    while True:
        x = deque(islice(it, len(start)), len(start))
        # move forward until start is found
        while x != start:
            x.append(next(it))
        out = list(x)
        x = deque(islice(it, len(end)), len(end))
        # move forward until end is found, storing the sublist
        while x != end:
            out.append(x[0])
            x.append(next(it))
        out.extend(end)
        yield out

data = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]

startSequence = [1,2,3]
endSequence = [4,5,6]

print(list(sublists(data, startSequence, endSequence)))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]

Answer 3

如果您真的想使用正则表达式，则可以将整数列表更改为字符串并以这种方式使用正则表达式

import re

def find_span(numbers, start, end):
    # Create strings from the start and end lists.
    start_pattern = ''.join(map(chr, start))
    end_pattern = ''.join(map(chr, end))

    # convert the list to search into one string.
    s = ''.join(map(chr, numbers))

    # Create a pattern that starts and ends with the correct sublists,
    # and match all sublists. Then convert each match back to a list of
    # integers
    # The '?' is to make the regex non-greedy
    return [
        [ord(c) for c in match]
        for match in re.findall(rf'{start_pattern}.*?{end_pattern}', s, re.DOTALL)
    ]

>>> find_span(search, start, end)  # Using OP's sample values
[[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]

请注意，这并不是真正有效的方法，因为它需要在每次调用时动态构建一个正则表达式。而且您需要使用re.DOTALL，因为否则它将与包含10（换行符的ascii编码）的任何内容都不匹配。但是，如果您 确实要使用正则表达式 ，则可以使用。

Answer 4

只需迭代列表中的所有索引，并将切片分别与startSequence或endSequence进行比较。假设子列表不应该重叠，则可以对两个循环使用相同的迭代器。

def myFunctionForSublists(data, startSequence, endSequence):
    positions = iter(range(len(data)))
    for start in positions:
        if data[start:start+len(startSequence)] == startSequence:
            for end in positions:
                if data[end:end+len(endSequence)] == endSequence:
                    yield data[start:end+len(endSequence)]
                    break

这样，start循环将继续在end循环的位置开始。如果它们可以重叠，请为循环使用两个单独的迭代器，即for start in range(len(data)):和for end in range(start+1, len(data)):

Answer 5

使用以下方法：

def find_sub_list(sl,l):
    sll=len(sl)
    for ind in (i for i,e in enumerate(l) if e==sl[0]):
        if l[ind:ind+sll]==sl:
            return ind,ind+sll-1

find_sub_list([1,2,3], data)    
>>>(2, 4)
find_sub_list([4,5,6], data)    
>>>(8, 10)

data[2:10+1]
>>>[1, 2, 3, 99, 99, 99, 4, 5, 6]

对于sublists[1]，您可以采用类似的方法

礼貌：find-starting-and-ending-indices-of-sublist-in-list

Answer 6

这是O（n）解决方案，它通过跟踪startSequence和endSequence的匹配模式来查找匹配项

src/main/java

Python：使用模块或正则表达式从列表中提取列表

6 个答案: