我有一个名为value的列表,其中包含一系列数字:
values = [0, 1, 2, 3, 4, 5, ... , 351, 0, 1, 2, 3, 4, 5, 6, ... , 750, 0, 1, 2, 3, 4, 5, ... , 559]
我想创建一个新列表,其中包含从0到数字的元素列表。
喜欢:
new_values = [[0, 1, 2, ... , 351], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]
我所做的代码是:
start = 0
new_values = []
for i,val in enumerate(values):
if(val == 0):
new_values.append(values[start:i])
start = i
然而,它返回的是:
new_values = [[], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]
如何修复我的代码?这真的是一个很大的帮助。
答案 0 :(得分:4)
因此,您编写的代码的问题在于它在开头包含空list
,并省略了最终的子list
。对此的极简主义解决方案是:
更改测试以避免追加第一个list
(当i
为0时),例如if val == 0 and i != 0:
在循环退出后追加最后一组
结合这两个修正案,您将拥有:
start = 0
new_values = []
for i,val in enumerate(values):
if val == 0 and i != 0: # Avoid adding empty list
new_values.append(values[start:i])
start = i
if values: # Handle edgecase for empty values where nothing to add
new_values.append(values[start:]) # Add final list
我打算添加更清晰的groupby
解决方案,避免list
开头/结尾的特殊情况,但Chris_Rands already handled that,所以我会引用你的答案。< / p>
有些令人惊讶的是,这实际上似乎是最快的解决方案,渐渐地,代价是要求输入为list
(其中一些其他解决方案可以接受任意迭代,包括用于索引的纯迭代器是不可能的。)
为了进行比较(使用Python 3.5额外的解包方案一般化,既简洁又在现代Python上获得最佳性能,并使用int
的隐式布尔值来避免与0
进行比较,因为它等同于{{ 1}}输入,但有意义地使用隐式布尔值更快):
int
使用from itertools import *
# truth is the same as bool, but unlike the bool constructor, it requires
# exactly one positional argument, which makes a *major* difference
# on runtime when it's in a hot code path
from operator import truth
def method1(values):
# Optimized/correct OP's code
# Only works on list inputs, and requires non-empty values to begin with 0,
# but handles repeated 0s as separate groups properly
new_values = []
start = None
for i, val in enumerate(values):
if not val and i:
new_values.append(values[start:i])
start = i
if values:
new_values.append(values[start:])
return new_values
def method2(values):
# Works with arbitrary iterables and iterators, but doesn't handle
# repeated 0s or non-empty values that don't begin with 0
return [[0, *g] for k, g in groupby(values, truth) if k]
def method3(values):
# Same behaviors and limitations as method1, but without verbose
# special casing for begin and end
start_indices = [i for i, val in enumerate(values) if not val]
# End indices for all but terminal slice are previous start index
# so make iterator and discard first value to pair properly
end_indices = iter(start_indices)
next(end_indices, None)
# Pairing with zip_longest avoids need to explicitly pad end_indices
return [values[s:e] for s, e in zip_longest(start_indices, end_indices)]
def method4(values):
# Requires any non-empty values to begin with 0
# but otherwise handles runs of 0s and arbitrary iterables (including iterators)
new_values = []
for val in values:
if not val:
curlist = [val]
new_values.append(curlist)
# Use pre-bound method in local name for speed
curlist_append = curlist.append
else:
curlist_append(val)
return new_values
def method5(values):
# Most flexible solution; similar to method2, but handles all inputs, empty, non-empty,
# with or without leading 0, with or without runs of repeated 0s
new_values = []
for nonzero, grp in groupby(values, truth):
if nonzero:
try:
new_values[-1] += grp
except IndexError:
new_values.append([*grp]) # Only happens when values begins with nonzero
else:
new_values += [[0] for _ in grp]
return new_values
6.1 ipython
魔法的Python 3.6,Linux x64上的计时:
%timeit
<强>要点:强>
大量分割广告投放的解决方案(>>> values = [*range(100), *range(50), *range(150)]
>>> %timeit -r5 method1(values)
12.5 μs ± 50.6 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
>>> %timeit -r5 method2(values)
16.9 μs ± 54.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
>>> %timeit -r5 method3(values)
13 μs ± 18.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
>>> %timeit -r5 method4(values)
16.7 μs ± 9.51 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
>>> %timeit -r5 method5(values)
18.2 μs ± 25.2 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
,method1
)是最快的,但取决于作为序列的输入(如果返回类型必须为method3
,则输入也必须为list
,或者必须添加转换)。
list
解决方案(groupby
,method2
)稍慢,但通常非常简洁(处理所有边缘) method5
中的案例不需要极端冗长,也不需要明确的测试和检查LBYL模式。 除了使用method5
代替operator.truth
之外,他们也不需要太多的hackery来让它们尽可能快地使用。这是必要的,因为CPython的bool
构造函数非常慢,这要归功于一些奇怪的实现细节(bool
必须接受完整的varargs,包括关键字,通过对象构建机器调度,这需要花费比bool
更多,它使用一个低开销路径,只需要一个位置参数并绕过对象构造机制);如果将operator.truth
用作bool
函数而不是key
,则运行时间会超过两倍(分别为operator.truth
和method2
的36.8μs和38.8μs)。
中间是更慢但更灵活的方法(处理任意输入迭代,包括迭代器,处理没有特殊外壳的0的运行等)逐项使用method5
s (append
)。问题是,获得最高性能需要更详细的代码(因为需要避免重复索引和方法绑定);如果method4
的循环更改为更简洁:
method4
由于反复索引for val in values:
if not val:
new_values.append([])
new_values[-1].append(val)
并反复绑定new_values
方法的成本,运行时间增加了一倍以上(达到~34.4μs)。
无论如何,就个人而言,如果效果不是绝对关键,我会使用append
作为groupby
的bool
解决方案之一只是为了避免导入和不常见的API。 如果效果更重要,我可能仍会使用key
,但可以将groupby
替换为operator.truth
函数;当然,它不如拼写版本快,但对于知道key
的人来说,它很容易遵循,而且对于任何给定级别的边缘案例处理来说,它通常是最简洁的解决方案。
答案 1 :(得分:1)
您可以根据itertools.groupby
(这是假的)的存在,使用0
对元素进行分组,并在0
之间提取子列表,同时将缺少的0
附加到 [[0]+list(g) for k, g in groupby(values, bool) if k]
列表理解:
>>> from itertools import groupby
>>> values = [0, 1, 2, 3, 4, 5 , 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 559]
>>> [[0]+list(g) for k, g in groupby(values, bool) if k]
[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 559]]
示例:
{{1}}
答案 2 :(得分:1)
您可以使用groupby
查找每个值小于itertools.groupby
中的元素的所有组:
values
输出:
import itertools
values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
new_vals = [[i[-1] for i in b] for a, b in itertools.groupby(enumerate(values), key=lambda x:x[-1] <= values[x[0]+1] if x[0]+1 < len(values) else False)]
final_data = [new_vals[i]+new_vals[i+1] for i in range(0, len(new_vals), 2)]
答案 3 :(得分:1)
这应该有效:
values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
new_values = []
split_at = 0 # split the list when this value is reached
idx = -1
for value in values:
if value == split_at:
idx += 1
new_values.append([])
new_values[idx].append(value)
输出:
[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 5, 559]]
它还处理边缘游戏。
我的方法比Chris_Rands's快一点,但它也比Vasilis G's方法慢一点:
from itertools import groupby
values = [
0, 1, 2, 3, 4, 5, 351,
0, 1, 2, 3, 4, 5, 6, 750,
0, 1, 2, 3, 4, 5, 559,
]
def method1():
new_values = []
idx = -1
for value in values:
if value == 0:
idx += 1
new_values.append([])
new_values[idx].append(value)
return new_values
def method2():
new_values = [[0] + list(g) for k, g in groupby(values, bool) if k]
return new_values
def method3():
indices = [index for index, value in enumerate(values) if value == 0] + [len(values)]
new_values = [values[indices[i]:indices[i + 1]] for i in range(len(indices) - 1)]
return new_values
>>> timeit.timeit(method1, number=100000)
0.6725746986698414
>>> timeit.timeit(method2, number=100000)
0.8143814620314903
>>> timeit.timeit(method3, number=100000)
0.6596001360341748
答案 4 :(得分:0)
你也可以这样做:
values = [0, 1, 2, 3, 4, 5, 351, 0, 1, 2, 3, 4, 5, 6, 750, 0, 1, 2, 3, 4, 5, 559]
# Find all indices whose element is 0.
indices = [index for index, value in enumerate(values) if value==0] + [len(values)]
# Split the list accordingly
values = [values[indices[i]:indices[i+1]] for i in range(len(indices)-1)]
print(values)
输出:
[[0, 1, 2, 3, 4, 5, 351], [0, 1, 2, 3, 4, 5, 6, 750], [0, 1, 2, 3, 4, 5, 559]]