数据帧条件逻辑

时间:2016-06-26 20:05:05

标签: python pandas

我有一个数据框(' dayData'),其中包含列' Power1'和' Power2'

      Power1         Power2   
 1.049246442   -0.231991505  
-0.950753558    0.276990531  
-0.950753558    0.531481549  
 0             -0.231991505  
-0.464648091   -0.231991505  
 1.049246442   -1.204952258   
 0.455388896   -0.486482523   
 0.879383766    0.226092327   
-0.50417844     0.83687077   
 0.152025349   -0.359237014  

我尝试使用条件逻辑来创建' resultPower'柱。对于每一行,我尝试安装的逻辑是:

if (Power1 >= 0 AND Power2 =<0) OR if (Power1 <= 0 AND Power2 >= 0) then 0, return the value for Power1.

因此,当添加resultPower列时,数据框将如下所示:

      Power1         Power2   ResultPower
 1.049246442   -0.231991505             0
-0.950753558    0.276990531             0
-0.950753558    0.531481549             0
 0             -0.231991505             0
-0.464648091   -0.231991505  -0.464648091
 1.049246442   -1.204952258             0
 0.455388896   -0.486482523             0
 0.879383766    0.226092327   0.879383766
-0.50417844     0.83687077              0
 0.152025349   -0.359237014             0

我之前在熊猫中使用过基本条件逻辑,例如我可以检查其中一个逻辑条件,即。

dayData['ResultPower'] = np.where(dayData.Power1 > 0, 0, dayData.Power1)

但我无法找到如何使用AND / OR函数添加逻辑条件。建立类似的东西:

dayData['ResultPower'] = np.where(dayData.Power1 >= 0 and dayData.Power2 =< 0 or dayData.Power1 =< 0 and dayData.Power2 >= 0, 0, dayData.Power1)

有人可以告诉我这是否可行以及这样做的语法?

数据帧再现

import pandas as pd
from io import StringIO

datastring = StringIO("""\
      Power1         Power2   
 1.049246442   -0.231991505  
-0.950753558    0.276990531  
-0.950753558    0.531481549  
 0             -0.231991505  
-0.464648091   -0.231991505  
 1.049246442   -1.204952258   
 0.455388896   -0.486482523   
 0.879383766    0.226092327   
-0.50417844     0.83687077   
 0.152025349   -0.359237014  
""")

df = pd.read_table(datastring, sep='\s\s+', engine='python')

2 个答案:

答案 0 :(得分:1)

df['ResultPower'] = df['Power1']
cond1 = (df.Power1 >= 0) & (df.Power2 <= 0)
cond2 = (df.Power1 <= 0) & (df.Power2 >= 0)
df.loc[cond1 | cond2, 'ResultPower'] = 0

使用timeit:100个循环,最佳3:1.87 ms每个循环

答案 1 :(得分:0)

当您需要对pandas对象进行逐元素逻辑操作时,您需要&使用and|使用or。所以,这就是你要找的东西:

In [15]: dayData
Out[15]: 
     Power1    Power2
0  1.049246 -0.231992
1 -0.950754  0.276991
2 -0.950754  0.531482
3  0.000000 -0.231992
4 -0.464648 -0.231992
5  1.049246 -1.204952
6  0.455389 -0.486483
7  0.879384  0.226092
8 -0.504178  0.836871
9  0.152025 -0.359237

In [16]: dayData['ResultsPower'] = np.where(((dayData.Power1 >= 0) & (dayData.Power2 <= 0)) | ((dayData.Power1 <= 0) & (dayData.Power2 >=0)),0, dayData.Power1)

In [17]: dayData
Out[17]: 
     Power1    Power2  ResultsPower
0  1.049246 -0.231992      0.000000
1 -0.950754  0.276991      0.000000
2 -0.950754  0.531482      0.000000
3  0.000000 -0.231992      0.000000
4 -0.464648 -0.231992     -0.464648
5  1.049246 -1.204952      0.000000
6  0.455389 -0.486483      0.000000
7  0.879384  0.226092      0.879384
8 -0.504178  0.836871      0.000000
9  0.152025 -0.359237      0.000000

在这里阅读更多相关信息:

http://pandas.pydata.org/pandas-docs/version/0.13.1/gotchas.html#bitwise-boolean

另一种方法是使用数据帧的apply方法,该方法将函数应用于数据帧的一行或多列。首先,定义你的功能:

In [18]: def my_function(S):
   ....:     if ((S.Power1 >=0) and (S.Power2 <=0)) or ((S.Power1 <=0) and (S.Power2 >= 0)):
   ....:         return 0
   ....:     else:
   ....:         return S.Power1
   ....:

如果要处理每一行,现在使用轴为1的apply方法:

In [29]: dayData.apply(my_function, axis=1)
Out[29]: 
0    0.000000
1    0.000000
2    0.000000
3    0.000000
4   -0.464648
5    0.000000
6    0.000000
7    0.879384
8    0.000000
9    0.000000
dtype: float64

现在我们可以比较每个操作的速度:

In [31]: timeit np.where(((dayData.Power1 >= 0) & (dayData.Power2 <= 0)) | ((dayData.Power1 <= 0) & (dayData.Power2 >=0)),0, dayData.Power1)
100 loops, best of 3: 2.21 ms per loop

In [32]: timeit dayData.apply(my_function, axis=1)
1000 loops, best of 3: 990 µs per loop

所以似乎在这种情况下使用apply更快,但这可能是因为它必须转换数据结构。