Python - 根据另一个列表将列表拆分为子列表

时间:2015-09-16 20:48:18

标签: python list python-2.7 split

我有两个列表:l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]l2 = [0.5, 1.0, 1.5, 2.0]。我想将l1拆分为子列表,这些子列表被定义为l2的两个索引之间的元素。因此,例如l1将等于[[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

这是我的解决方案:

l3 = []
b=0
for i in l2:
    temp = []
    for p in l1:
        if b <= p < i:
        temp.append(p)
    l3.append(temp)
    b+=0.5

此解决方案是我的代码中的一个巨大瓶颈。有更快的方法吗?

4 个答案:

答案 0 :(得分:4)

您的列表已排序,因此无需在此处进行双循环。

以下基于两个列表作为输入生成子列表:

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

然后,您可以迭代partition(l1, l2)以获取单个子列表,或者调用list()一次性生成整个列表列表:

>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] 
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

答案 1 :(得分:2)

作为一种快速的方法,您可以使用numpy非常有效的方式处理大型列表:

>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0.   ,  0.002,  0.3  ]), array([ 0.5,  0.6,  0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]

np.searchsorted会在l2内找到l1个项目的索引,而l1仍然排序(默认排序),np.split会根据您的列表进行拆分在指数列表上。

基于1000个时间更长的列表中已接受答案的基准:

from timeit import timeit

s1="""

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))

"""

s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
   """

print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")

结果:

1st:  17.5872459412
2nd :  10.3306460381

答案 2 :(得分:1)

def split_l(a,b):
    it = iter(b)
    start, sub = next(it), []
    for ele in a:
        if ele >= start:
            yield sub
            sub, start = [], next(it)
        sub.append(ele)
    yield sub

print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

使用kasras输入这可以胜过接受的答案和numpy解决方案:

In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000

In [15]: l1.sort()

In [16]: l2 = [0.5, 1.0, 1.5, 2.0]

In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop

In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop

In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop

In [20]: list(split_l(l1,l2))  == list(partition(l1,l2))
Out[20]: True

创建一个本地引用以追加更多关闭:

def split_l(a, b):
    it = iter(b)
    start, sub = next(it), []
    append = sub.append
    for ele in a:
        if start <= ele:
            yield sub
            start, sub = next(it), []
            append = sub.append
        append(ele)
    yield sub

在numpy解决方案的时间内运行:

In [47]: l1.sort()

In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop

In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop

In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop

答案 3 :(得分:0)

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]

l2 = [0.5, 1.0, 1.5, 2.0]


  def partition(values, indices):

    temp = []
    p_list = []


    for j in range(len(indices)):
        for i in range(len(values)):
            if indices[j] > values[i]:
                temp.append(values[i])

        p_list.append(temp)

        # added to the partition values are truncated from the list
        values = values[len(temp):]

        temp = []

    print(p_list)

分区(l1,l2)

[[0,0.002,0.3],[0.5,0.6,0.9],[1.3],[1.9]]