在Python中创建列表中的列表

时间:2018-02-21 14:59:58

标签: python list

我有一个名为value的列表,其中包含一系列数字:

values = [0, 1, 2, 3, 4, 5, ... , 351, 0, 1, 2, 3, 4, 5, 6, ... , 750, 0, 1, 2, 3, 4, 5, ... , 559]

我想创建一个新列表,其中包含从0到数字的元素列表。

喜欢:

new_values = [[0, 1, 2, ... , 351], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]

我所做的代码是:

start = 0
new_values = []
for i,val in enumerate(values): 
    if(val == 0):
        new_values.append(values[start:i]) 
        start = i

然而,它返回的是:

new_values = [[], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]

如何修复我的代码?这真的是一个很大的帮助。

5 个答案:

答案 0 :(得分:4)

因此,您编写的代码的问题在于它在开头包含空list,并省略了最终的子list。对此的极简主义解决方案是:

  1. 更改测试以避免追加第一个list(当i为0时),例如if val == 0 and i != 0:

  2. 在循环退出后追加最后一组

  3. 结合这两个修正案,您将拥有:

    start = 0
    new_values = []
    for i,val in enumerate(values): 
        if val == 0 and i != 0:  # Avoid adding empty list
            new_values.append(values[start:i]) 
            start = i
    if values:  # Handle edgecase for empty values where nothing to add
        new_values.append(values[start:])  # Add final list
    

    我打算添加更清晰的groupby解决方案,避免list开头/结尾的特殊情况,但Chris_Rands already handled that,所以我会引用你的答案。< / p>

    有些令人惊讶的是,这实际上似乎是最快的解决方案,渐渐地,代价是要求输入为list(其中一些其他解决方案可以接受任意迭代,包括用于索引的纯迭代器是不可能的。)

    为了进行比较(使用Python 3.5额外的解包方案一般化,既简洁又在现代Python上获得最佳性能,并使用int的隐式布尔值来避免与0进行比较,因为它等同于{{ 1}}输入,但有意义地使用隐式布尔值更快):

    int

    使用from itertools import * # truth is the same as bool, but unlike the bool constructor, it requires # exactly one positional argument, which makes a *major* difference # on runtime when it's in a hot code path from operator import truth def method1(values): # Optimized/correct OP's code # Only works on list inputs, and requires non-empty values to begin with 0, # but handles repeated 0s as separate groups properly new_values = [] start = None for i, val in enumerate(values): if not val and i: new_values.append(values[start:i]) start = i if values: new_values.append(values[start:]) return new_values def method2(values): # Works with arbitrary iterables and iterators, but doesn't handle # repeated 0s or non-empty values that don't begin with 0 return [[0, *g] for k, g in groupby(values, truth) if k] def method3(values): # Same behaviors and limitations as method1, but without verbose # special casing for begin and end start_indices = [i for i, val in enumerate(values) if not val] # End indices for all but terminal slice are previous start index # so make iterator and discard first value to pair properly end_indices = iter(start_indices) next(end_indices, None) # Pairing with zip_longest avoids need to explicitly pad end_indices return [values[s:e] for s, e in zip_longest(start_indices, end_indices)] def method4(values): # Requires any non-empty values to begin with 0 # but otherwise handles runs of 0s and arbitrary iterables (including iterators) new_values = [] for val in values: if not val: curlist = [val] new_values.append(curlist) # Use pre-bound method in local name for speed curlist_append = curlist.append else: curlist_append(val) return new_values def method5(values): # Most flexible solution; similar to method2, but handles all inputs, empty, non-empty, # with or without leading 0, with or without runs of repeated 0s new_values = [] for nonzero, grp in groupby(values, truth): if nonzero: try: new_values[-1] += grp except IndexError: new_values.append([*grp]) # Only happens when values begins with nonzero else: new_values += [[0] for _ in grp] return new_values 6.1 ipython魔法的Python 3.6,Linux x64上的计时:

    %timeit

    <强>要点:

    大量分割广告投放的解决方案>>> values = [*range(100), *range(50), *range(150)] >>> %timeit -r5 method1(values) 12.5 μs ± 50.6 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method2(values) 16.9 μs ± 54.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method3(values) 13 μs ± 18.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method4(values) 16.7 μs ± 9.51 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) >>> %timeit -r5 method5(values) 18.2 μs ± 25.2 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each) method1是最快的,但取决于作为序列的输入(如果返回类型必须为method3,则输入也必须为list,或者必须添加转换)。

    list解决方案groupbymethod2稍慢,但通常非常简洁(处理所有边缘) method5中的案例不需要极端冗长,也不需要明确的测试和检查LBYL模式。 除了使用method5代替operator.truth之外,他们也不需要太多的hackery来让它们尽可能快地使用。这是必要的,因为CPython的bool构造函数非常慢,这要归功于一些奇怪的实现细节(bool必须接受完整的varargs,包括关键字,通过对象构建机器调度,这需要花费比bool更多,它使用一个低开销路径,只需要一个位置参数并绕过对象构造机制);如果将operator.truth用作bool函数而不是key,则运行时间会超过两倍(分别为operator.truthmethod2的36.8μs和38.8μs)。

    中间是更慢但更灵活的方法(处理任意输入迭代,包括迭代器,处理没有特殊外壳的0的运行等)逐项使用method5 s append)。问题是,获得最高性能需要更详细的代码(因为需要避免重复索引和方法绑定);如果method4的循环更改为更简洁:

    method4

    由于反复索引for val in values: if not val: new_values.append([]) new_values[-1].append(val) 并反复绑定new_values方法的成本,运行时间增加了一倍以上(达到~34.4μs)。

    无论如何,就个人而言,如果效果不是绝对关键,我会使用append作为groupby bool解决方案之一只是为了避免导入和不常见的API。 如果效果更重要,我可能仍会使用key,但可以将groupby替换为operator.truth函数;当然,它不如拼写版本快,但对于知道key的人来说,它很容易遵循,而且对于任何给定级别的边缘案例处理来说,它通常是最简洁的解决方案。

答案 1 :(得分:1)

您可以根据itertools.groupby(这是假的)的存在,使用0对元素进行分组,并在0之间提取子列表,同时将缺少的0附加到 [[0]+list(g) for k, g in groupby(values, bool) if k] 列表理解:

>>> from itertools import groupby
>>> values = [0, 1, 2, 3, 4, 5 , 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 559]
>>> [[0]+list(g) for k, g in groupby(values, bool) if k]
[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 559]]

示例:

{{1}}

答案 2 :(得分:1)

您可以使用groupby查找每个值小于itertools.groupby中的元素的所有组:

values

输出:

import itertools
values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
new_vals = [[i[-1] for i in b] for a, b in itertools.groupby(enumerate(values), key=lambda x:x[-1] <= values[x[0]+1] if x[0]+1 < len(values) else False)]
final_data = [new_vals[i]+new_vals[i+1] for i in range(0, len(new_vals), 2)]

答案 3 :(得分:1)

这应该有效:

values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
new_values = []

split_at = 0  # split the list when this value is reached

idx = -1
for value in values:
    if value == split_at:
        idx += 1
        new_values.append([])

    new_values[idx].append(value)

输出:

[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 5, 559]]

它还处理边缘游戏。

我的方法比Chris_Rands's快一点,但它也比Vasilis G's方法慢一点:

from itertools import groupby


values = [
    0, 1, 2, 3, 4, 5, 351,
    0, 1, 2, 3, 4, 5, 6, 750,
    0, 1, 2, 3, 4, 5, 559,
]


def method1():
    new_values = []

    idx = -1
    for value in values:
        if value == 0:
            idx += 1
            new_values.append([])

        new_values[idx].append(value)

    return new_values


def method2():
    new_values = [[0] + list(g) for k, g in groupby(values, bool) if k]
    return new_values


def method3():
    indices = [index for index, value in enumerate(values) if value == 0] + [len(values)]
    new_values = [values[indices[i]:indices[i + 1]] for i in range(len(indices) - 1)]
    return new_values
>>> timeit.timeit(method1, number=100000)
0.6725746986698414
>>> timeit.timeit(method2, number=100000)
0.8143814620314903
>>> timeit.timeit(method3, number=100000)
0.6596001360341748

答案 4 :(得分:0)

你也可以这样做:

values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]

# Find all indices whose element is 0.
indices = [index for index, value in enumerate(values) if value==0] + [len(values)]

# Split the list accordingly
values = [values[indices[i]:indices[i+1]] for i in range(len(indices)-1)]

print(values)

输出:

[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 5, 559]]