给定元素列表[1,27,10,...]
我需要生成每个元素的n
次重复列表,如[1, 1, ..., 1, 27, 27, ..., 27, 10, ..., 10]
最优雅,最pythonic和最快的方法是什么?
答案
numpy
是最快速,最简洁的解决方案。
np.repeat(my_list, n)
看起来非常pythonic(归功于B.M.),而扁平化的numpy数组似乎稍微快一些。
另请参阅下面的B.M.帖子中的numba
替代
更多细节
我测试了3种方法:i)双循环,ii)使用索引函数进行单循环,以及iii)展平numpy数组。 (编辑:迈克使用extend
的第4种方法,B.M。使用np.repeat
进行第5种方法,使用gsb-eng进行第6种方法理解,使用第7种方法迭代
令人惊讶的是,我发现对数组进行扁平化是到目前为止在我的机器上使用python 2.7中最快的方法。但是,在某些机器和Python 3中,您可能希望测试itertools和comprehension。您可以复制/粘贴下面的Python 2代码以便快速检查,排序的timeit
结果为:
Flattened array: 8.8ms
Numpy Repeat: 10.87ms
Extend List: 14.37ms
Itertools Repeat: 14.91ms
Itertools Chain Comprehension: 18.72ms
Itertools Chain: 18.73ms
Double Loop : 58.4ms
Single Loop + index by division: 251.29ms
Double Loop + comprehension: 255.76ms
这是生成结果的代码:
import numpy as np
import timeit
n = 100
my_list = range(10)
n_elements = len(my_list)
# === Double Loop =============================================================
def double_loop():
my_long_list = []
for list_element in my_list:
my_long_list += [list_element] * n
return my_long_list
# === Double Loop with Comprehension =========================================================
def double_loop_comp():
# List comprehension
return [i for i in my_list for j in xrange(n)]
# === Single Loop with Indexing Function ======================================
def one_loop_with_indexing():
my_long_list = []
for i in range(n*n_elements):
my_long_list.append(my_list[i // n])
return my_long_list
# === Flattened Array =========================================================
def flattened_array():
my_array = np.zeros([n_elements, n])
for i in range(n_elements):
my_array[i,:] = my_list[i]
return my_array.flatten()
# === Extend List =========================================================
def extend_list():
my_long_list = []
for list_element in my_list:
my_long_list.extend([list_element] * n)
return my_long_list
# === Numpy Repeat =========================================================
def numpy_repeat():
return np.repeat(my_list, n)
# === Itertools Repeat ========================================================
def iter_repeat():
my_long_list = []
for x in my_list:
my_long_list.extend( itertools.repeat(x,n) )
return my_long_list
# === Itertools Chain =========================================================
def iter_chain():
return list( itertools.chain.from_iterable( itertools.repeat(x,n) for x in my_list ) )
# === Itertools Chain Comp ====================================================
def iter_chain_comp():
return list( itertools.chain.from_iterable( [itertools.repeat(x,n) for x in my_list] ) )
time_double_loop = timeit.timeit(double_loop, number=1000)
time_double_loop_comp = timeit.timeit(double_loop_comp, number=1000)
time_single_loop = timeit.timeit(one_loop_with_indexing, number=1000)
time_flattened_array = timeit.timeit(flattened_array, number=1000)
time_extend_list = timeit.timeit(extend_list, number=1000)
time_np_repeat = timeit.timeit(numpy_repeat, number=1000)
time_it_repeat = timeit.timeit(iter_repeat, number=1000)
time_it_chain = timeit.timeit(iter_chain, number=1000)
time_it_chain_comp = timeit.timeit(iter_chain_comp, number=1000)
print 'Double Loop : ' + str(round(time_double_loop*1000,2))+'ms'
print 'Double Loop + comprehension: ' + str(round(time_double_loop_comp*1000,2))+'ms'
print 'Single Loop + index by division: ' + str(round(time_single_loop*1000,2))+'ms'
print 'Flattened array: ' + str(round(time_flattened_array*1000,2))+'ms'
print 'Extend List: ' + str(round(time_extend_list*1000,2))+'ms'
print 'Numpy Repeat: ' + str(round(time_np_repeat*1000,2))+'ms'
print 'Itertools Repeat: ' + str(round(time_it_repeat*1000,2))+'ms'
print 'Itertools Chain: ' + str(round(time_it_chain*1000,2))+'ms'
print 'Itertools Chain Comprehension: ' + str(round(time_it_chain_comp*1000,2))+'ms'
答案 0 :(得分:3)
我已将flatten array
替换为原生list comprehension
....
[i for i in my_list for j in xrange(n)]
对于这种情况,这个问题更为pythonic
解决方法......
以下是相同的timeit
结果..
Double Loop :0.0249750614166
Single Loop + indexing function: 0.198489904404
List comprehension: 0.00534200668335
以下是在代码中添加list comprehension
条目后的完整代码。
import timeit
n = 100
my_list = range(10)
n_elements = len(my_list)
# === Double Loop =============================================================
def double_loop():
my_long_list = []
for list_element in my_list:
my_long_list += [list_element] * n
return my_long_list
def double_loop_comp():
# List comprehension
return [i for i in my_list for j in xrange(n)]
# === Single Loop with Indexing Function ======================================
def one_loop_with_indexing():
my_long_list = []
for i in range(n*n_elements):
my_long_list.append(my_list[i / n]) # !! This would not work if you use "from __future__ import division"
return my_long_list
time_double_loop = timeit.timeit(double_loop, number=1000)
time_single_loop = timeit.timeit(one_loop_with_indexing, number=1000)
time_double_loop_comp = timeit.timeit(double_loop_comp, number=1000)
print 'Double Loop :' + str(time_double_loop)
print 'Single Loop + indexing function: ' + str(time_single_loop)
print 'List comprehension: ' + str(time_double_loop_comp)
答案 1 :(得分:1)
您可以使用extend()
:
def extend_list():
my_long_list = []
for list_element in my_list:
my_long_list.extend([list_element] * n)
return my_long_list
我的机器速度更快:
Double Loop :0.0226180553436
Single Loop + indexing function: 0.300093889236
Flattened array: 0.0395331382751
Extend List: 0.0189819335938
列出正确结果的理解:
def double_loop_comp():
return [i for i in my_list for j in xrange(n)]
列表理解速度较慢:
Double Loop :0.016893863678
Single Loop + indexing function: 0.300258874893
Flattened array: 0.0327677726746
Extend List: 0.0180258750916
Comp: 0.0602869987488
答案 2 :(得分:1)
在快速类别中,a=np.array(my_list)
(测试中有100个元素):
可读:
In [12]: %timeit np.repeat(a,100)
10000 loops, best of 3: 80.4 µs per loop
整蛊:
In [13]: %timeit np.lib.stride_tricks.as_strided(a,(100,100),(a.itemsize,0)).ravel()
10000 loops, best of 3: 29.5 µs per loop
及时编造numba(conda install numba
之后)
from numba import jit
@jit
def numbarep(a,n):
res=np.empty(a.size*n,dtype=a.dtype)
offset=0
for e in a:
for k in range(offset,offset+n):
res[k]=e
offset+=n
return res
In [14]: %timeit numbarep(a,100)
100000 loops, best of 3: 14.8 µs per loop
答案 3 :(得分:1)
你也可以使用itertools获取列表,在我的机器中也是最快的
import itertools as it
n = 100
my_list = range(10)
n_elements = len(my_list)
def iter_repeat():
my_long_list = []
for x in my_list:
my_long_list.extend( it.repeat(x,n) )
return my_long_list
def iter_chain():
return list( it.chain.from_iterable( it.repeat(x,n) for x in my_list ) )
def iter_chain_comp():
return list( it.chain.from_iterable( [it.repeat(x,n) for x in my_list] ) )
使用你的脚本来测试它我在python3中得到了这个时间
Double Loop : 0.015303148491881732
Double Loop + comprehension : 0.04365179467151968
Single Loop + index by division: 0.3784320416645417
Extend List: 0.01603116899830609
Flattened array: 0.018885064147608488
Numpy Repeat: 0.0254420405658366
Itertools repeat extend: 0.015163157712790254
Itertools chain repeat: 0.025397544719181653
Itertools chain repeat comp: 0.025096342064901633
order
0.015163157712790254 time_iter_repeat
0.015303148491881732 time_double_loop
0.01603116899830609 time_extend_list
0.018885064147608488 time_flattened_array
0.025096342064901633 time_iter_chain_comp
0.025397544719181653 time_iter_chain
0.0254420405658366 time_np_repeat
0.04365179467151968 time_double_loop_comp
0.3784320416645417 time_single_loop
在python 2中我得到了
Double Loop : 0.0188628162243
Double Loop + comprehension : 0.069114371782
Single Loop + index by division: 0.239681327592
Extend List: 0.0197920948679
Itertools repeat extend: 0.0275025405417
Itertools chain repeat: 0.0315609040324
Itertools chain repeat comp: 0.0317361492131
order
0.0188628162243 time_double_loop
0.0197920948679 time_extend_list
0.0275025405417 time_iter_repeat
0.0315609040324 time_iter_chain
0.0317361492131 time_iter_chain_comp
0.069114371782 time_double_loop_comp
0.239681327592 time_single_loop
(我不会在python2中使用numpy并且安装它很痛苦所以我不做那个测试)
答案 4 :(得分:0)
如果结果只能是可迭代的,但不必是一个列表,那么你可以作弊'仅创建一个生成器:
import itertools as it
def make_generator():
return it.chain.from_iterable(it.repeat(elem, n) for elem in my_list)