我有一个数据框(' dayData'),其中包含列' Power1'和' Power2'
Power1 Power2
1.049246442 -0.231991505
-0.950753558 0.276990531
-0.950753558 0.531481549
0 -0.231991505
-0.464648091 -0.231991505
1.049246442 -1.204952258
0.455388896 -0.486482523
0.879383766 0.226092327
-0.50417844 0.83687077
0.152025349 -0.359237014
我尝试使用条件逻辑来创建' resultPower'柱。对于每一行,我尝试安装的逻辑是:
if (Power1 >= 0 AND Power2 =<0) OR if (Power1 <= 0 AND Power2 >= 0) then 0, return the value for Power1.
因此,当添加resultPower列时,数据框将如下所示:
Power1 Power2 ResultPower
1.049246442 -0.231991505 0
-0.950753558 0.276990531 0
-0.950753558 0.531481549 0
0 -0.231991505 0
-0.464648091 -0.231991505 -0.464648091
1.049246442 -1.204952258 0
0.455388896 -0.486482523 0
0.879383766 0.226092327 0.879383766
-0.50417844 0.83687077 0
0.152025349 -0.359237014 0
我之前在熊猫中使用过基本条件逻辑,例如我可以检查其中一个逻辑条件,即。
dayData['ResultPower'] = np.where(dayData.Power1 > 0, 0, dayData.Power1)
但我无法找到如何使用AND / OR函数添加逻辑条件。建立类似的东西:
dayData['ResultPower'] = np.where(dayData.Power1 >= 0 and dayData.Power2 =< 0 or dayData.Power1 =< 0 and dayData.Power2 >= 0, 0, dayData.Power1)
有人可以告诉我这是否可行以及这样做的语法?
import pandas as pd
from io import StringIO
datastring = StringIO("""\
Power1 Power2
1.049246442 -0.231991505
-0.950753558 0.276990531
-0.950753558 0.531481549
0 -0.231991505
-0.464648091 -0.231991505
1.049246442 -1.204952258
0.455388896 -0.486482523
0.879383766 0.226092327
-0.50417844 0.83687077
0.152025349 -0.359237014
""")
df = pd.read_table(datastring, sep='\s\s+', engine='python')
答案 0 :(得分:1)
df['ResultPower'] = df['Power1']
cond1 = (df.Power1 >= 0) & (df.Power2 <= 0)
cond2 = (df.Power1 <= 0) & (df.Power2 >= 0)
df.loc[cond1 | cond2, 'ResultPower'] = 0
使用timeit:100个循环,最佳3:1.87 ms每个循环
答案 1 :(得分:0)
当您需要对pandas对象进行逐元素逻辑操作时,您需要&
使用and
而|
使用or
。所以,这就是你要找的东西:
In [15]: dayData
Out[15]:
Power1 Power2
0 1.049246 -0.231992
1 -0.950754 0.276991
2 -0.950754 0.531482
3 0.000000 -0.231992
4 -0.464648 -0.231992
5 1.049246 -1.204952
6 0.455389 -0.486483
7 0.879384 0.226092
8 -0.504178 0.836871
9 0.152025 -0.359237
In [16]: dayData['ResultsPower'] = np.where(((dayData.Power1 >= 0) & (dayData.Power2 <= 0)) | ((dayData.Power1 <= 0) & (dayData.Power2 >=0)),0, dayData.Power1)
In [17]: dayData
Out[17]:
Power1 Power2 ResultsPower
0 1.049246 -0.231992 0.000000
1 -0.950754 0.276991 0.000000
2 -0.950754 0.531482 0.000000
3 0.000000 -0.231992 0.000000
4 -0.464648 -0.231992 -0.464648
5 1.049246 -1.204952 0.000000
6 0.455389 -0.486483 0.000000
7 0.879384 0.226092 0.879384
8 -0.504178 0.836871 0.000000
9 0.152025 -0.359237 0.000000
在这里阅读更多相关信息:
http://pandas.pydata.org/pandas-docs/version/0.13.1/gotchas.html#bitwise-boolean
另一种方法是使用数据帧的apply
方法,该方法将函数应用于数据帧的一行或多列。首先,定义你的功能:
In [18]: def my_function(S):
....: if ((S.Power1 >=0) and (S.Power2 <=0)) or ((S.Power1 <=0) and (S.Power2 >= 0)):
....: return 0
....: else:
....: return S.Power1
....:
如果要处理每一行,现在使用轴为1的apply
方法:
In [29]: dayData.apply(my_function, axis=1)
Out[29]:
0 0.000000
1 0.000000
2 0.000000
3 0.000000
4 -0.464648
5 0.000000
6 0.000000
7 0.879384
8 0.000000
9 0.000000
dtype: float64
现在我们可以比较每个操作的速度:
In [31]: timeit np.where(((dayData.Power1 >= 0) & (dayData.Power2 <= 0)) | ((dayData.Power1 <= 0) & (dayData.Power2 >=0)),0, dayData.Power1)
100 loops, best of 3: 2.21 ms per loop
In [32]: timeit dayData.apply(my_function, axis=1)
1000 loops, best of 3: 990 µs per loop
所以似乎在这种情况下使用apply更快,但这可能是因为它必须转换数据结构。