Question

我有一个名为value的列表，其中包含一系列数字：

values = [0, 1, 2, 3, 4, 5, ... , 351, 0, 1, 2, 3, 4, 5, 6, ... , 750, 0, 1, 2, 3, 4, 5, ... , 559]

我想创建一个新列表，其中包含从0到数字的元素列表。

喜欢：

new_values = [[0, 1, 2, ... , 351], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]

我所做的代码是：

start = 0
new_values = []
for i,val in enumerate(values): 
    if(val == 0):
        new_values.append(values[start:i]) 
        start = i

然而，它返回的是：

new_values = [[], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]

如何修复我的代码？这真的是一个很大的帮助。

Answer 1

因此，您编写的代码的问题在于它在开头包含空list，并省略了最终的子list。对此的极简主义解决方案是：

更改测试以避免追加第一个list（当i为0时），例如if val == 0 and i != 0:
在循环退出后追加最后一组

结合这两个修正案，您将拥有：

start = 0
new_values = []
for i,val in enumerate(values): 
    if val == 0 and i != 0:  # Avoid adding empty list
        new_values.append(values[start:i]) 
        start = i
if values:  # Handle edgecase for empty values where nothing to add
    new_values.append(values[start:])  # Add final list

我打算添加更清晰的groupby解决方案，避免list开头/结尾的特殊情况，但Chris_Rands already handled that，所以我会引用你的答案。< / p>

有些令人惊讶的是，这实际上似乎是最快的解决方案，渐渐地，代价是要求输入为list（其中一些其他解决方案可以接受任意迭代，包括用于索引的纯迭代器是不可能的。）

为了进行比较（使用Python 3.5额外的解包方案一般化，既简洁又在现代Python上获得最佳性能，并使用int的隐式布尔值来避免与0进行比较，因为它等同于{{ 1}}输入，但有意义地使用隐式布尔值更快）：

int

使用from itertools import * # truth is the same as bool, but unlike the bool constructor, it requires # exactly one positional argument, which makes a *major* difference # on runtime when it's in a hot code path from operator import truth def method1(values): # Optimized/correct OP's code # Only works on list inputs, and requires non-empty values to begin with 0, # but handles repeated 0s as separate groups properly new_values = [] start = None for i, val in enumerate(values): if not val and i: new_values.append(values[start:i]) start = i if values: new_values.append(values[start:]) return new_values def method2(values): # Works with arbitrary iterables and iterators, but doesn't handle # repeated 0s or non-empty values that don't begin with 0 return [[0, *g] for k, g in groupby(values, truth) if k] def method3(values): # Same behaviors and limitations as method1, but without verbose # special casing for begin and end start_indices = [i for i, val in enumerate(values) if not val] # End indices for all but terminal slice are previous start index # so make iterator and discard first value to pair properly end_indices = iter(start_indices) next(end_indices, None) # Pairing with zip_longest avoids need to explicitly pad end_indices return [values[s:e] for s, e in zip_longest(start_indices, end_indices)] def method4(values): # Requires any non-empty values to begin with 0 # but otherwise handles runs of 0s and arbitrary iterables (including iterators) new_values = [] for val in values: if not val: curlist = [val] new_values.append(curlist) # Use pre-bound method in local name for speed curlist_append = curlist.append else: curlist_append(val) return new_values def method5(values): # Most flexible solution; similar to method2, but handles all inputs, empty, non-empty, # with or without leading 0, with or without runs of repeated 0s new_values = [] for nonzero, grp in groupby(values, truth): if nonzero: try: new_values[-1] += grp except IndexError: new_values.append([*grp]) # Only happens when values begins with nonzero else: new_values += [[0] for _ in grp] return new_values 6.1 ipython魔法的Python 3.6，Linux x64上的计时：

%timeit

<强>要点：

大量分割广告投放的解决方案（>>> values = [*range(100), *range(50), *range(150)] >>> %timeit -r5 method1(values) 12.5 μs ± 50.6 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method2(values) 16.9 μs ± 54.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method3(values) 13 μs ± 18.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method4(values) 16.7 μs ± 9.51 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method5(values) 18.2 μs ± 25.2 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)，method1）是最快的，但取决于作为序列的输入（如果返回类型必须为method3，则输入也必须为list，或者必须添加转换）。

list解决方案（groupby，method2）稍慢，但通常非常简洁（处理所有边缘） method5中的案例不需要极端冗长，也不需要明确的测试和检查LBYL模式。 除了使用method5代替operator.truth之外，他们也不需要太多的hackery来让它们尽可能快地使用。这是必要的，因为CPython的bool构造函数非常慢，这要归功于一些奇怪的实现细节（bool必须接受完整的varargs，包括关键字，通过对象构建机器调度，这需要花费比bool更多，它使用一个低开销路径，只需要一个位置参数并绕过对象构造机制）;如果将operator.truth用作bool函数而不是key，则运行时间会超过两倍（分别为operator.truth和method2的36.8μs和38.8μs）。

中间是更慢但更灵活的方法（处理任意输入迭代，包括迭代器，处理没有特殊外壳的0的运行等）逐项使用method5 s （append）。问题是，获得最高性能需要更详细的代码（因为需要避免重复索引和方法绑定）;如果method4的循环更改为更简洁：

method4

由于反复索引for val in values: if not val: new_values.append([]) new_values[-1].append(val)并反复绑定new_values方法的成本，运行时间增加了一倍以上（达到~34.4μs）。

无论如何，就个人而言，如果效果不是绝对关键，我会使用append作为groupby 的bool解决方案之一只是为了避免导入和不常见的API。 如果效果更重要，我可能仍会使用key，但可以将groupby替换为operator.truth函数;当然，它不如拼写版本快，但对于知道key的人来说，它很容易遵循，而且对于任何给定级别的边缘案例处理来说，它通常是最简洁的解决方案。

Answer 2

您可以根据itertools.groupby（这是假的）的存在，使用0对元素进行分组，并在0之间提取子列表，同时将缺少的0附加到[[0]+list(g) for k, g in groupby(values, bool) if k]列表理解：

>>> from itertools import groupby
>>> values = [0, 1, 2, 3, 4, 5 , 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 559]
>>> [[0]+list(g) for k, g in groupby(values, bool) if k]
[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 559]]

示例：

{{1}}

Answer 3

您可以使用groupby查找每个值小于itertools.groupby中的元素的所有组：

values

输出：

import itertools
values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
new_vals = [[i[-1] for i in b] for a, b in itertools.groupby(enumerate(values), key=lambda x:x[-1] <= values[x[0]+1] if x[0]+1 < len(values) else False)]
final_data = [new_vals[i]+new_vals[i+1] for i in range(0, len(new_vals), 2)]

Answer 4

这应该有效：

values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
new_values = []

split_at = 0  # split the list when this value is reached

idx = -1
for value in values:
    if value == split_at:
        idx += 1
        new_values.append([])

    new_values[idx].append(value)

输出：

[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 5, 559]]

它还处理边缘游戏。

我的方法比Chris_Rands's快一点，但它也比Vasilis G's方法慢一点：

from itertools import groupby


values = [
    0, 1, 2, 3, 4, 5, 351,
    0, 1, 2, 3, 4, 5, 6, 750,
    0, 1, 2, 3, 4, 5, 559,
]


def method1():
    new_values = []

    idx = -1
    for value in values:
        if value == 0:
            idx += 1
            new_values.append([])

        new_values[idx].append(value)

    return new_values


def method2():
    new_values = [[0] + list(g) for k, g in groupby(values, bool) if k]
    return new_values


def method3():
    indices = [index for index, value in enumerate(values) if value == 0] + [len(values)]
    new_values = [values[indices[i]:indices[i + 1]] for i in range(len(indices) - 1)]
    return new_values

>>> timeit.timeit(method1, number=100000)
0.6725746986698414
>>> timeit.timeit(method2, number=100000)
0.8143814620314903
>>> timeit.timeit(method3, number=100000)
0.6596001360341748

Answer 5

你也可以这样做：

values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]

# Find all indices whose element is 0.
indices = [index for index, value in enumerate(values) if value==0] + [len(values)]

# Split the list accordingly
values = [values[indices[i]:indices[i+1]] for i in range(len(indices)-1)]

print(values)

输出：

[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 5, 559]]

在Python中创建列表中的列表

5 个答案: