Question

给出如下列表：

[a, SEP, b, c, SEP, SEP, d]

如何将其拆分为子列表列表：

[[a], [b, c], [], [d]]

有效地，我需要一个str.split()的列表。我可以一起砍伐一些东西，但似乎无法提出任何精巧和/或Pythonic的东西。

我从迭代器获取输入，因此在该迭代器上工作的生成器也是可以接受的。

更多示例：

[a, SEP, SEP, SEP] -> [[a], [], [], []]

[a, b, c] -> [[a, b, c]]

[SEP] -> [[], []]

Answer 1

一个简单的生成器将适用于您问题中的所有情况：

def split(seq):
    chunk = []
    for val in seq:
        if val == SEP:
            yield chunk
            chunk = []
        else:
            chunk.append(val)
    yield chunk

Answer 2

我不确定是否有简单的itertools.groupby解决方案，但这是一种可行的迭代方法：

def mySplit(iterable, sep):
    output = []
    sepcount = 0
    current_output = []
    for i, elem in enumerate(iterable):
        if elem != sep:
            sepcount = 0
            current_output.append(elem)
            if (i==(len(iterable)-1)):
                output.append(current_output)
        else:
            if current_output: 
                output.append(current_output)
                current_output = []

            sepcount+=1

            if (i==0) or (sepcount > 1):
                output.append([])
            if (i==(len(iterable)-1)):
                output.append([])

    return output

测试您的示例：

testLists = [
    ['a', 'SEP', 'b', 'c', 'SEP', 'SEP', 'd'],
    ["a", "SEP", "SEP", "SEP"],
    ["SEP"],
    ["a", "b", "c"]
]

for tl in testLists:
    print(mySplit(tl, sep="SEP"))
#[['a'], ['b', 'c'], [], ['d']]
#[['a'], [], [], []]
#[[], []]
#[['a', 'b', 'c']]

这类似于如果示例实际上是字符串并且您使用了str.split(sep)会得到的结果：

for tl in testLists:
    print("".join(tl).split("SEP"))
#['a', 'bc', '', 'd']
#['a', '', '', '']
#['', '']
#['abc']

顺便说一句，如果始终保证列表中的元素都是字符串，则可以简单地进行以下操作：

for tl in testLists:
    print([list(x) for x in "".join(tl).split("SEP")])
#[['a'], ['b', 'c'], [], ['d']]
#[['a'], [], [], []]
#[[], []]
#[['a', 'b', 'c']]

但是mySplit()函数更通用。

Answer 3

我的第一个Python程序：）

from pprint import pprint
my_array = ["a", "SEP", "SEP", "SEP"]
my_temp = []
my_final = []
for item in my_array:
  if item != "SEP":
    my_temp.append(item)
  else:
    my_final.append(my_temp);
    my_temp = []
pprint(my_final);

Answer 4

对于list或tuple对象，您可以使用以下内容：

def split(seq, sep):
    start, stop = 0, -1
    while start < len(seq):
        try:
            stop = seq.index(sep, start)
        except ValueError:
            yield seq[start:]
            break
        yield seq[start:stop]
        start = stop + 1
    else:
        if stop == len(seq) - 1:
            yield []

我不会使用发电机，但是速度很快。

Answer 5

您可以使用itertools.takewhile：

def split(seq, sep):
    seq, peek = iter(seq), sep
    while True:
        try:
            peek = next(seq)
        except StopIteration:
            break
        yield list(it.takewhile(sep.__ne__, it.chain((peek,), seq)))
    if peek == sep:
        yield []

it.chain部分是找出seq用尽的时间。请注意，如果需要，使用这种方法很容易产生生成器而不是列表。

Answer 6

itertools.takewhile @a_guest的方法简化了：

def split(seq, sep):
    from itertools import takewhile
    iterator = iter(seq)
    while subseq := list(takewhile(lambda x: x != sep, iterator)):
        yield subseq

请注意，它将在第一个空子序列上返回。

Answer 7

我将定义以下函数来解决该问题。

l = ['a', 'SEP', 'b', 'c', 'SEP', 'SEP', 'd']

def sublist_with_words(word, search_list):
    res = []
    for i in range(search_list.count(word)):
        index = search_list.index(word)
        res.append(search_list[:index])
        search_list = search_list[index+1:]
    res.append(search_list)
    return res

当我尝试您提出的案例时：

print(sublist_with_words(word = 'SEP', search_list=l))
print(sublist_with_words(word = 'SEP', search_list=['a', 'b', 'c']))
print(sublist_with_words(word = 'SEP', search_list=['SEP']))

输出为：

[['a'], ['b', 'c'], [], ['d']]
[['a', 'b', 'c']]
[[], []]

Answer 8

以下是一个非通用解决方案，（很可能）仅适用于int列表：

import re

def split_list(nums, n):
    nums_str = str(nums)
    splits = nums_str.split(f"{n},")

    patc = re.compile(r"\d+")
    group = []
    for part in splits:
        group.append([int(v) for v in patc.findall(part)])

    return group

if __name__ == "__main__":
    l = [1, 2, 3, 4, 3, 6, 7, 3, 8, 9, 10]
    n = 3
    split_l = split_list(l, n)
    assert split_l == [[1, 2], [4], [6, 7], [8, 9, 10]]

如何基于分隔符将列表拆分为子列表，类似于str.split（）？

8 个答案: