我有两个python列表:
a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
我需要过滤掉与b中类似的所有元素。就像在这种情况下,我应该得到:
c = [('why', 4), ('throw', 9), ('you', 1)]
最有效的方法是什么?
答案 0 :(得分:10)
列表理解将起作用。
a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
filtered = [i for i in a if not i[0] in b]
>>>print(filtered)
[('why', 4), ('throw', 9), ('you', 1)]
答案 1 :(得分:4)
列表理解应该有效:
c = [item for item in a if item[0] not in b]
或者使用词典理解:
d = dict(a)
c = {key: value for key in d.iteritems() if key not in b}
答案 2 :(得分:2)
in
很不错,但您应该至少使用b
的集合。如果你有numpy,你当然也可以尝试np.in1d
,但如果它更快或者没有,你应该尝试。
# ruthless copy, but use the set...
b = set(b)
filtered = [i for i in a if not i[0] in b]
# with numpy (note if you create the array like this, you must already put
# the maximum string length, here 10), otherwise, just use an object array.
# its slower (likely not worth it), but safe.
a = np.array(a, dtype=[('key', 's10'), ('val', int)])
b = np.asarray(b)
mask = ~np.in1d(a['key'], b)
filtered = a[mask]
集合也有方法difference
等,这些方法在这里可能没有用,但通常可能是。
答案 3 :(得分:2)
由于这是用numpy
标记的,因此这是一个使用numpy.in1d
基于列表理解基准的numpy解决方案:
In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
In [2]: b = ['the', 'when', 'send', 'we', 'us']
In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])
In [4]: b_ar = np.array(b)
In [5]: %timeit filtered = [i for i in a if not i[0] in b]
1000000 loops, best of 3: 778 ns per loop
In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
10000 loops, best of 3: 31.4 us per loop
因此,对于5个记录,列表理解更快。
然而,对于大型数据集,numpy解决方案的速度是列表理解的两倍:
In [7]: a = a * 1000
In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])
In [9]: %timeit filtered = [i for i in a if not i[0] in b]
1000 loops, best of 3: 647 us per loop
In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
1000 loops, best of 3: 302 us per loop
答案 4 :(得分:0)
试试这个:
a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
c=[]
for x in a:
if x[0] not in b:
c.append(x)
print c
答案 5 :(得分:0)
简便
a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
c=[] # a list to store the required tuples
#compare the first element of each tuple in with an element in b
for i in a:
if i[0] not in b:
c.append(i)
print(c)
答案 6 :(得分:-1)
使用过滤器:
c = filter(lambda (x, y): False if x in b else True, a)