根据另一列中的多个条件填充pandas数据框中的列

时间:2015-01-13 02:17:27

标签: python-3.x numpy pandas

我试图在数据框中填充列(信号),条件是数据帧中的另一列(diff)与2个变量进行比较。要填写的此列有3个可能的结果,1,-1,0表示买入,卖出,持有(封面)。到目前为止,这是代码和输出。

import numpy as np
import Quandl
tlm = Quandl.get("GOOG/NYSE_TLM", trim_start="2014-12-01", trim_end="2015-01-01")

tlm['diff'] = (tlm.Open - tlm.Close.shift(1))/tlm.Close.shift(1)  # lags data

lowerbound = -0.08
upperbound = 0.08

tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0, 0.0)
tlm['signal'] = np.where(tlm['diff'] <= lowerbound, -1.0, 0.0)

print(tlm.head(20))  # is dataframe

            Open  High   Low  Close     Volume      diff  signal
Date                                                            
2014-12-01  4.91  4.93  4.53   4.53   12999427       NaN       0
2014-12-02  4.62  4.82  4.47   4.64    8015450  0.019868       0
2014-12-03  4.51  4.83  4.48   4.63    9175510 -0.028017       0
2014-12-04  4.59  4.62  4.04   4.05   16065766 -0.008639       0
2014-12-05  4.05  4.09  3.86   3.94    8783581  0.000000       0
2014-12-08  3.88  4.04  3.46   3.74   17497626 -0.015228       0
2014-12-09  4.09  4.36  4.04   4.22   12559347  0.093583       0
2014-12-10  4.20  4.20  3.67   3.79   12403674 -0.004739       0
2014-12-11  3.74  3.95  3.67   3.69    9396960 -0.013193       0
2014-12-12  5.05  5.24  4.17   4.29   75949020  0.368564       0
2014-12-15  5.33  5.35  4.99   5.12   38834129  0.242424       0
2014-12-16  7.47  7.60  7.46   7.58  282795097  0.458984       0
2014-12-17  7.59  7.66  7.55   7.64   73152687  0.001319       0
2014-12-18  7.68  7.82  7.66   7.78   55387941  0.005236       0
2014-12-19  7.77  7.89  7.77   7.85   31330786 -0.001285       0
2014-12-22  7.82  7.85  7.78   7.79   22758351 -0.003822       0
2014-12-23  7.79  7.88  7.79   7.84   19068732  0.000000       0
2014-12-24  7.83  7.86  7.82   7.84    9174813 -0.001276       0
2014-12-26  7.84  7.86  7.82   7.85    9717732  0.000000       0
2014-12-29  7.84  7.86  7.81   7.83   12035787 -0.001274       0

上面代码的问题是打印之前的行覆盖了前一行工作正常,你会在适当的信号列中看到1。所以我不得不为条件转到for循环,但是我在循环中得到了一个Value错误。我有点理解布尔比较问题与Numpy数组有关,但是如果我不能比较条件,我将如何生成3个条件(1,-1,0)?

for index, row in tlm.iterrows():
if tlm['diff'] >= upperbound:  # value error here
    tlm['signal'] = 1.0
    if tlm['diff'] <= lowerbound:
        tlm['signal'] = -1.0
    else:
        tlm['signal'] = 0.0

是代码和熊猫的新手。提前谢谢!

1 个答案:

答案 0 :(得分:0)

您可以使用np.select

conditions = [tlm['diff'] >= upperbound,
              tlm['diff'] <= lowerbound]
choices = [1, -1]

tlm['signal'] = np.select(conditions, choices, default=0)

或等同地,但不是可读的:

tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0, 
                         np.where(tlm['diff'] <= lowerbound, -1.0, 0.0))