Question

我有一个列表，其中包含一些特定元素。我想将该列表拆分为“子列表”或基于这些元素的不同列表。例如：

test_list = ['a and b, 123','1','2','x','y','Foo and Bar, gibberish','123','321','June','July','August','Bonnie and Clyde, foobar','today','tomorrow','yesterday']

如果元素匹配“某事物”，我想分成子列表：

new_list = [['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

到目前为止，如果特定元素后面有固定数量的项目，我可以完成此操作。例如：

import re
element_regex = re.compile(r'[A-Z a-z]+ and [A-Z a-z]+')
new_list = [test_list[i:(i+4)] for i, x in enumerate(test_list) if element_regex.match(x)]

几乎就在那里，但并不总是有三个元素跟随感兴趣的特定元素。有没有比循环每个项目更好的方法？

Answer 1

如果你想要一个单行，

new_list = reduce(lambda a, b: a[:-1] + [ a[-1] + [ b ] ] if not element_regex.match(b) or not a[0] else a + [ [ b ] ], test_list, [ [] ])

会做的。然而，python way将使用更详细的变体。

我在4核i7 @ 2.1 GHz上进行了一些速度测量。 timeit模块运行此代码1.000.000次，需要11.38s。使用itertools模块中的groupby（来自其他答案的Kasras变体）需要9.92秒。最快的变体是我建议的详细版本，只需5.66秒：

new_list = [[]]
for i in test_list:
    if element_regex.match(i):
        new_list.append([])
    new_list[-1].append(i)

Answer 2

您不需要regex，只需使用itertools.groupby：

>>> from itertools import groupby
>>> from operator import add
>>> g_list=[list(g) for k,g in groupby(test_list , lambda i : 'and' in i)]
>>> [add(*g_list[i:i+2]) for i in range(0,len(g_list),2)]
[['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

首先我们通过这个lambda函数lambda i : 'and' in i对列表进行分组，找到包含"and"的元素！然后我们有了这个：

>>> g_list
[['a and b, 123'], ['1', '2', 'x', 'y'], ['Foo and Bar, gibberish'], ['123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar'], ['today', 'tomorrow', 'yesterday']]

所以我们必须在这里连接2对列表，我们使用add运算符和列表推导！

通过将正则表达式与元素匹配来拆分列表

2 个答案: