Python:查找列表列表中* *不包含的索引

时间:2016-01-30 23:31:19

标签: python list set

我有一个包含索引的列表列表,例如,

[[ 1955 16898 15202 18603]
 [ 7758 14357 13451 18447]
 [12883 13453 14576 14604]
 ..., 
 [  954 17712  1196  1250]
 [17712   859   954 18962]
 [  954   859 17712  1250]]

范围从021000。有些条目会出现不止一次,但我想知道的是:021000之间的哪些索引包含在该列表中?

列表可能很大,因此效率很重要。

3 个答案:

答案 0 :(得分:4)

首先你应该使用numpy。然后,您可以使用setdiff1dflatten

import numpy as np

a = np.array(your_list_of_lists)

np.setdiff1d(np.arange(21000), a.flatten())

编辑:

为避免复制输入两次,您可以使用ravel展平数组:

import numpy as np

a = np.ravel(your_list_of_lists)

np.setdiff1d(np.arange(21000), a)

答案 1 :(得分:2)

当然,numpy在处理大量数据时非常快。但是,我只想使用Python sets提供本机方法,因为可能存在不允许使用其他模块的环境(例如,由于安全问题)。

有关详细说明,请参阅代码中的注释:

# sample list containing some numbers from 0 to 10
# 0, 9 and 10 are missing and need to be found
l = [
    [1,2,3,4],
    [5,6,7,8],
    ]

# flatten/merge sublists
merged = [item for sublist in l for item in sublist]

# convert list to set
s = set(merged)

# define a set containing all numbers of the desired range
interval = set([i for i in range(0,11)])

# get the difference of both sets
# the difference are the elements which are missing
missing = interval.difference(s)

答案 2 :(得分:0)

常量内存,线性时间,无依赖性:

def missing(ll):
    present = [False for _ in range(21000 + 1)]

    for l in ll:
        for n in l:
            present[n] = True

    return [n for n, is_present in enumerate(present) if not is_present]