如何在np.where函数中将熊猫数据框列作为参数?

时间:2020-09-05 17:14:35

标签: python pandas numpy

我正在尝试使用data['adjusted_returns']创建一个新列np.where。主要目标是保持cc_returns不变,但是,将位置为-1的cc_returns乘以-1,最后得到负回报。

将数据框列名称作为参数放入np.where函数中时出现此错误。

追踪

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-c4a3a56db2d6> in <module>
----> 1 backtest_strategy(df)

<ipython-input-12-0f8c051d0e3d> in backtest_strategy(data)
      9     data['cc_returns'] = data['Adj Close'].pct_change()
     10 
---> 11     data['adjusted_returns'] = np.where ((data['position']== 1),(data['cc_returns']),0) | np.where((data['position'] == 0),(data['cc_returns']),0) | np.where((data['position']==-1),-1*(data['cc_returns']),0)

TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

请帮助。谢谢

Date,Open,High,Low,Close,Adj Close,Volume,avg,std,upper,lower,condition1,condition2,signal,position,cc_returns,adjusted_returns
2015-01-02,46.65999984741211,47.41999816894531,46.540000915527344,46.7599983215332,41.647891998291016,27913900,,,,,0,0,0,0,,0
2015-01-05,46.369998931884766,46.72999954223633,46.25,46.33000183105469,41.26490783691406,39673900,,,,,0,0,0,0,-0.009195763410850821,0
2015-01-06,46.380001068115234,46.75,45.540000915527344,45.650001525878906,40.659244537353516,36447900,,,,,0,0,0,0,-0.014677442197477575,0
2015-01-07,45.97999954223633,46.459999084472656,45.4900016784668,46.22999954223633,41.17583084106445,29114100,,,,,0,0,0,0,0.012705260749160008,0
2015-01-08,46.75,47.75,46.720001220703125,47.59000015258789,42.38714599609375,29645200,,,,,0,0,0,0,0.02941811082585999,0
2015-01-09,47.61000061035156,47.81999969482422,46.900001525878906,47.189998626708984,42.03087615966797,23944200,,,,,0,0,0,0,-0.00840513858750036,0
2015-01-12,47.41999816894531,47.540000915527344,46.36000061035156,46.599998474121094,41.50537872314453,23651900,,,,,0,0,0,0,-0.012502652443578954,0
2015-01-13,46.970001220703125,47.90999984741211,46.060001373291016,46.36000061035156,41.2916145324707,35270600,,,,,0,0,0,0,-0.005150276837604828,0
2015-01-14,45.959999084472656,46.2400016784668,45.619998931884766,45.959999084472656,40.93534851074219,29719600,,,,,0,0,0,0,-0.008628047746797485,0
2015-01-15,46.220001220703125,46.380001068115234,45.40999984741211,45.47999954223633,40.507835388183594,32750800,,,,,0,0,0,0,-0.010443617511804115,0

我正在使用的代码

def backtest_strategy(data):
    data['condition1'] = np.where(data['Adj Close'] < data['lower'],1,0)
    data['condition2'] = np.where(data['Adj Close'] > data['upper'],-1,0)
    
    data['signal'] = np.where (data['condition1'] == 1,1,0) | np.where(data['condition2']==-1,-1,0)
    
    data['position'] = data['signal'].replace(to_replace=0,method='ffill')
    
    data['cc_returns'] = data['Adj Close'].pct_change()
    
    data['adjusted_returns'] = np.where ((data['position']== 1),(data['cc_returns']),0) | np.where((data['position'] == 0),(data['cc_returns']),0) | np.where((data['position']==-1),-1*(data['cc_returns']),0)

2 个答案:

答案 0 :(得分:1)

  • 基于问题的条件,如果data['cc_returns']data['position']1,则0是值,如果-data['cc_returns'],则是data['position']-1
    • cond_1 = np.where((data['position'] == 1), (data['cc_returns']), 0)
    • cond_2 = np.where((data['position'] == 0), (data['cc_returns']), 0)
    • cond_3 = np.where((data['position'] == -1), -1*(data['cc_returns']), 0)
  • 因此,data['adjusted_returns']的逻辑可以简化为:
    • 如果'position'-1,则返回-1 * 'cc_returns',否则返回'cc_returns'
cond_4 = np.where((data['position'] == -1), -1*data['cc_returns'], data['cc_returns'])
data['adjusted_returns'] = cond_4

更新功能

def backtest_strategy(data):
    data['condition1'] = np.where(data['Adj Close'] < data['lower'],1,0)
    data['condition2'] = np.where(data['Adj Close'] > data['upper'],-1,0)
    
    data['signal'] = np.where (data['condition1'] == 1,1,0) | np.where(data['condition2']==-1,-1,0)
    
    data['position'] = data['signal'].replace(to_replace=0,method='ffill')
    
    data['cc_returns'] = data['Adj Close'].pct_change()

    cond_4 = np.where((data['position'] == -1), -1*data['cc_returns'], data['cc_returns'])
    data['adjusted_returns'] = cond_4

答案 1 :(得分:1)

您可以使用np.where来定义一个乘数,然后将其应用于cc_returns

data['cc_returns'] = data['Adj Close'].pct_change()

# create toy positions
np.random.seed(12)
data['position'] = np.random.choice((-1, 0, 1), data.shape[0])

data['adjusted_returns'] = np.where(data.position.ge(0), 1, -1)
data['adjusted_returns'] *= data.cc_returns

print(data[['position', 'cc_returns', 'adjusted_returns']])

输出

   position  cc_returns  adjusted_returns
0         1         NaN               NaN
1         0   -0.009196         -0.009196
2         0   -0.014677         -0.014677
3         1    0.012705          0.012705
4        -1    0.029418         -0.029418
5        -1   -0.008405          0.008405
6         1   -0.012503         -0.012503
7         0   -0.005150         -0.005150
8        -1   -0.008628          0.008628
9         0   -0.010444         -0.010444