我有两个列表:l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
和l2 = [0.5, 1.0, 1.5, 2.0]
。我想将l1
拆分为子列表,这些子列表被定义为l2
的两个索引之间的元素。因此,例如l1
将等于[[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
。
这是我的解决方案:
l3 = []
b=0
for i in l2:
temp = []
for p in l1:
if b <= p < i:
temp.append(p)
l3.append(temp)
b+=0.5
此解决方案是我的代码中的一个巨大瓶颈。有更快的方法吗?
答案 0 :(得分:4)
您的列表已排序,因此无需在此处进行双循环。
以下基于两个列表作为输入生成子列表:
def partition(values, indices):
idx = 0
for index in indices:
sublist = []
while idx < len(values) and values[idx] < index:
sublist.append(values[idx])
idx += 1
if sublist:
yield sublist
然后,您可以迭代partition(l1, l2)
以获取单个子列表,或者调用list()
一次性生成整个列表列表:
>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
答案 1 :(得分:2)
作为一种快速的方法,您可以使用numpy
非常有效的方式处理大型列表:
>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0. , 0.002, 0.3 ]), array([ 0.5, 0.6, 0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]
np.searchsorted
会在l2
内找到l1
个项目的索引,而l1
仍然排序(默认排序),np.split
会根据您的列表进行拆分在指数列表上。
基于1000个时间更长的列表中已接受答案的基准:
from timeit import timeit
s1="""
def partition(values, indices):
idx = 0
for index in indices:
sublist = []
while idx < len(values) and values[idx] < index:
sublist.append(values[idx])
idx += 1
if sublist:
yield sublist
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))
"""
s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
"""
print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")
结果:
1st: 17.5872459412
2nd : 10.3306460381
答案 2 :(得分:1)
def split_l(a,b):
it = iter(b)
start, sub = next(it), []
for ele in a:
if ele >= start:
yield sub
sub, start = [], next(it)
sub.append(ele)
yield sub
print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
使用kasras输入这可以胜过接受的答案和numpy解决方案:
In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
In [15]: l1.sort()
In [16]: l2 = [0.5, 1.0, 1.5, 2.0]
In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop
In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop
In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop
In [20]: list(split_l(l1,l2)) == list(partition(l1,l2))
Out[20]: True
创建一个本地引用以追加更多关闭:
def split_l(a, b):
it = iter(b)
start, sub = next(it), []
append = sub.append
for ele in a:
if start <= ele:
yield sub
start, sub = next(it), []
append = sub.append
append(ele)
yield sub
在numpy解决方案的时间内运行:
In [47]: l1.sort()
In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop
In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop
In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop
答案 3 :(得分:0)
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
l2 = [0.5, 1.0, 1.5, 2.0]
def partition(values, indices):
temp = []
p_list = []
for j in range(len(indices)):
for i in range(len(values)):
if indices[j] > values[i]:
temp.append(values[i])
p_list.append(temp)
# added to the partition values are truncated from the list
values = values[len(temp):]
temp = []
print(p_list)
分区(l1,l2)
[[0,0.002,0.3],[0.5,0.6,0.9],[1.3],[1.9]]