获取条件列值

时间:2016-06-16 13:46:37

标签: python pandas dataframe conditional-statements

我关注DataFrame

   A      B
0  1      5
1  2      3
2  3      2
3  4      0
4  5      1

如何通过列A的条件值获取?

例如,所有大于3且小于6的值。

2 个答案:

答案 0 :(得分:1)

您可以使用boolean indexing,条件是您的间隔的终点

df[(df.A > 3) & (df.A < 6)]

或便利方法.between(),其后面的内容转换为上述内容(因此速度非常慢),默认情况下您需要注意限制是否包含在内:

df[df.A.between(4, 5)] # uses inclusive limits

得到:

   A  B
3  4  0
4  5  1

答案 1 :(得分:0)

between使用boolean indexing(可能使用参数inclusive=False):

print (df[df.A.between(4,5)])

样品:

df = pd.DataFrame({'A': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5,5: 6}, 
                   'B': {0: 5, 1: 3, 2: 2, 3: 0, 4: 2, 5: 1}})
print (df)
   A  B
0  1  5
1  2  3
2  3  2
3  4  0
4  5  2
5  6  1

print (df[df.A.between(4,5)]) #default inclusive=True
   A  B
3  4  0
4  5  2

print (df[df.A.between(3,6, inclusive=False)])
   A  B
3  4  0
4  5  2

时间相同:

df = pd.concat([df]*10000).reset_index(drop=True)

In [427]: %timeit (df[df.A.between(3,6, inclusive=False)])
The slowest run took 4.72 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.32 ms per loop

In [428]: %timeit (df[(df.A>3) & (df.A<6)])
1000 loops, best of 3: 1.31 ms per loop