从列表中删除出现在另一个列表中的元素并返回其索引

时间:2018-10-05 09:12:58

标签: python algorithm list

例如list

my_list = ['a', 'd', 'a', 'd', 'c','e']
words_2_remove = ['a', 'c']

输出应为:

my_list = ['d', 'd', 'e']
loc = [0, 2, 4]

我目前正在使用此

loc = []    
for word in my_list:  
    if word in words_2_remove:
         loc.append( my_list.index(word) )
         my_list.remove(word)

还有更好的选择吗?

3 个答案:

答案 0 :(得分:3)

做两个列表理解:

my_list =['a', 'd', 'a', 'd', 'c','e']
words_2_remove = ['a', 'c']

loc = [i for i, x in enumerate(my_list) if x in words_2_remove]

my_list = [x for x in my_list if x not in words_2_remove]

print(my_list) # ['d', 'd', 'e']
print(loc)     # [0, 2, 4]

答案 1 :(得分:1)

对于使用NumPy的更大数组,效率更高:

import numpy as np


my_list = np.array(['a', 'd', 'a', 'd', 'c','e'])
words_2_remove = np.array(['a', 'c'])

mask = np.isin(my_list, words_2_remove, invert=True)
# mask will be [False  True False  True False  True]
loc = np.where(~mask)[0]

print(loc)
>>> [0 2 4]

print(my_list[mask])
>>> ['d' 'd' 'e']

获得loc索引的补码也很容易:

print(np.where(mask)[0])
>>> [1 3 5]

时间:
与@Austin的列表推导版本进行比较。
对于原始数组:

my_list = np.array(['a', 'd', 'a', 'd', 'c','e'])
words_2_remove = np.array(['a', 'c'])

%%timeit
mask = np.isin(my_list, words_2_remove, invert=True)
loc = np.where(~mask)[0]
>>> 11 µs ± 53.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

my_list =['a', 'd', 'a', 'd', 'c','e']
words_2_remove = ['a', 'c']

%%timeit
loc = [i for i, x in enumerate(my_list) if x in words_2_remove]
res = [x for x in my_list if x not in words_2_remove]
>>> 1.31 µs ± 7.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

对于大型数组:

n = 10 ** 3
my_list = np.array(['a', 'd', 'a', 'd', 'c','e'] * n)
words_2_remove = np.array(['a', 'c'])

%%timeit
mask = np.isin(my_list, words_2_remove, invert=True)
loc = np.where(~mask)[0]
>>> 114 µs ± 906 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

my_list =['a', 'd', 'a', 'd', 'c','e'] * n
words_2_remove = ['a', 'c']

%%timeit
loc = [i for i, x in enumerate(my_list) if x in words_2_remove]
res = [x for x in my_list if x not in words_2_remove]
>>> 841 µs ± 677 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

根据用例,您可以选择更适合的情况。


进一步阅读:

np.isin上的文档:https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.isin.html
将布尔值掩码数组转换为索引:How to turn a boolean array into index array in numpy
np.where上的文档:https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html
有关使用NumPy编制索引的更多信息:https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.indexing.html

答案 2 :(得分:0)

使用列表理解枚举

loc = [idx for idx, item in enumerate(my_list) if item in words_2_remove]
my_list = [i for i in my_list if i not in words_2_remove]

或使用过滤器

my_list = list(filter(lambda x: x not in words_2_remove, my_list))

展开说明:

loc = []
new_my_list = []
for idx, item in enumerate(my_list):
    if item in words_2_remove:
        loc.append(idx)
    else:
        new_my_list.append(item)