如何在python中按dataframe过滤列表?

时间:2017-09-03 08:29:52

标签: python list pandas numpy dataframe

如何在python中按数据框过滤列表?

例如,我有列表L = ['a', 'b', 'c']和数据框df

Name Value
   a     0
   a     1
   b     2
   d     3

结果应为['a', 'b']

3 个答案:

答案 0 :(得分:1)

a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist()
print (a)
['a', 'b']

或者:

a = np.intersect1d(L, df['Name']).tolist()
print (a)
['a', 'b']

<强>定时

df = pd.concat([df]*1000).reset_index(drop=True)

L = ['a', 'b', 'c']

#jezrael 1
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist())
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop

#jezrael 2    
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist())
1000 loops, best of 3: 1.81 ms per loop

#divakar
In [165]: %timeit ([i for i in L if i in df.Name.tolist()])
1000 loops, best of 3: 393 µs per loop

#john galt 1
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist())
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.36 ms per loop

#john galt 2    
In [167]: %timeit ([x for x in df.Name.unique() if x in L])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 182 µs per loop

答案 1 :(得分:1)

这是一个 -

[i for i in l if i in df.Name.tolist()]

示例运行 -

In [303]: df
Out[303]: 
  Name  Value
0    a      0
1    a      1
2    b      2
3    d      3

In [304]: l = ['a', 'b', 'c']

In [305]: [i for i in l if i in df.Name.tolist()]
Out[305]: ['a', 'b']

答案 2 :(得分:1)

使用query

的另一种方法
In [1470]: df.query('Name in @L').Name.unique().tolist()
Out[1470]: ['a', 'b']

或者,

In [1472]: [x for x in df.Name.unique() if x in L]
Out[1472]: ['a', 'b']