Question

如何在python中按数据框过滤列表？

例如，我有列表L = ['a', 'b', 'c']和数据框df：

Name Value
   a     0
   a     1
   b     2
   d     3

结果应为['a', 'b']。

Answer 1

a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist()
print (a)
['a', 'b']

或者：

a = np.intersect1d(L, df['Name']).tolist()
print (a)
['a', 'b']

<强>定时：

df = pd.concat([df]*1000).reset_index(drop=True)

L = ['a', 'b', 'c']

#jezrael 1
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist())
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop

#jezrael 2    
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist())
1000 loops, best of 3: 1.81 ms per loop

#divakar
In [165]: %timeit ([i for i in L if i in df.Name.tolist()])
1000 loops, best of 3: 393 µs per loop

#john galt 1
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist())
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.36 ms per loop

#john galt 2    
In [167]: %timeit ([x for x in df.Name.unique() if x in L])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 182 µs per loop

Answer 2

这是一个 -

[i for i in l if i in df.Name.tolist()]

示例运行 -

In [303]: df
Out[303]: 
  Name  Value
0    a      0
1    a      1
2    b      2
3    d      3

In [304]: l = ['a', 'b', 'c']

In [305]: [i for i in l if i in df.Name.tolist()]
Out[305]: ['a', 'b']

Answer 3

使用query

的另一种方法

In [1470]: df.query('Name in @L').Name.unique().tolist()
Out[1470]: ['a', 'b']

或者，

In [1472]: [x for x in df.Name.unique() if x in L]
Out[1472]: ['a', 'b']

如何在python中按dataframe过滤列表？

3 个答案: