我有一个包含索引的列表列表,例如,
[[ 1955 16898 15202 18603]
[ 7758 14357 13451 18447]
[12883 13453 14576 14604]
...,
[ 954 17712 1196 1250]
[17712 859 954 18962]
[ 954 859 17712 1250]]
范围从0
到21000
。有些条目会出现不止一次,但我想知道的是:0
和21000
之间的哪些索引不包含在该列表中?
列表可能很大,因此效率很重要。
答案 0 :(得分:4)
首先你应该使用numpy。然后,您可以使用setdiff1d
和flatten
:
import numpy as np
a = np.array(your_list_of_lists)
np.setdiff1d(np.arange(21000), a.flatten())
编辑:
为避免复制输入两次,您可以使用ravel
展平数组:
import numpy as np
a = np.ravel(your_list_of_lists)
np.setdiff1d(np.arange(21000), a)
答案 1 :(得分:2)
当然,numpy
在处理大量数据时非常快。但是,我只想使用Python sets
提供本机方法,因为可能存在不允许使用其他模块的环境(例如,由于安全问题)。
有关详细说明,请参阅代码中的注释:
# sample list containing some numbers from 0 to 10
# 0, 9 and 10 are missing and need to be found
l = [
[1,2,3,4],
[5,6,7,8],
]
# flatten/merge sublists
merged = [item for sublist in l for item in sublist]
# convert list to set
s = set(merged)
# define a set containing all numbers of the desired range
interval = set([i for i in range(0,11)])
# get the difference of both sets
# the difference are the elements which are missing
missing = interval.difference(s)
答案 2 :(得分:0)
常量内存,线性时间,无依赖性:
def missing(ll):
present = [False for _ in range(21000 + 1)]
for l in ll:
for n in l:
present[n] = True
return [n for n, is_present in enumerate(present) if not is_present]