在Python数据框的另一列中找到两次出现值之间的列最小值

时间:2019-05-02 20:30:46

标签: python pandas trading algorithmic-trading

我有每天包含开盘价,最高价,最低价和收盘价的股票价格数据。我正在创建一个新列“ signal”,它将根据某些条件采用值“ signal”或“ none”。

每次df['signal']=="signal",我们都必须将其与df['signal']=="signal"的前3次出现进行比较。让我们想象当前发生的是第四信号。因此,df['signal']=="signal"的先前出现将是第三个信号,偶数先前的将是第二个信号,而该信号之前的一个信号将是第一个信号。

我需要检查信号4和信号3之间的df ['low']最小值是否大于信号1和信号2之间的df ['low']最小值。

如果更大,我需要一个新列df ['trade'] ==“购买”。

Sample data

No Open High Low Close signal Trade 

1   75   95   65  50    signal
2   78   94   74  77    none
3   83   91   81  84    none
4   91   101  88  93    signal
5   104  121  95  103   none
6   101  111  99  105   none
7   97   108  95  101   signal
8   103  113  102 106   none
9   108  128  105 114   signal  BUY
10  104  114  99  102   none
11  110  130  105 115   signal  BUY
12  112  122  110 115   none
13  118  145  112 123   none
14  123  143  71  133   signal  NONE
15  130  150  120 140   none

在上面的示例数据中,在第9行中,发生df ['Trade'] ==“买入”,因为此df ['signal'] =” signal之间的df ['Low'] = 95的最小值“和先前的df ['signal'] =“ signal”大于前两次出现的df ['signal'] =“ signal”的最小值df ['Low'] = 65。

类似地,在第14行中,发生df ['Trade'] =“ None”的原因是该信号与先前信号之间的df ['Low'] = 71的最小值不大于df [前两个信号之间的'低'] = 99。

我需要有关代码的帮助以实现此目的。


    import pandas as pd
    import numpy as np
    import bisect as bs

    df = pd.read_csv("Nifty.csv")
    cols = ['No', 'Low', 'signal']
    df['5EMA'] = df['Close'].ewm(span=5).mean()
    df['10EMA'] = df['Close'].ewm(span=10).mean()
    condition1 = df['5EMA'].shift(1) < df['10EMA'].shift(1)
    condition2 = df['5EMA'] > df['10EMA']
    df['signal'] = np.where(condition1 & condition2, 'signal', None)
    df1 = pd.concat([df[cols], df.loc[df.signal=='signal',cols].assign(signal='temp')]) \
            .sort_values(['No', 'signal'],ascending=[1,0])
    df1['g'] = (df1.signal == 'signal').cumsum()
    df1['Low_min'] = df1.groupby('g').Low.transform('min')
    s = df1.groupby('g').Low.min()
    buy = s[s.shift(1) > s.shift(3)].index.tolist()
    m1 = df1.signal.eq('signal') & df1.g.gt(3)
    m2 = df1.g.isin(buy) & m1
    df1['trade'] = np.select([m2, m1], ['Buy', 'None'], '')
    df['trade'] = ''
    df.trade.update(df1.loc[df1.signal=='signal',"trade"])
    print(df)

1 个答案:

答案 0 :(得分:1)

添加一些额外的临时行后,可以简化您的问题。我设置了一个新的数据框,其中仅包含原始 df 中的必填字段,并克隆了所有标记为“信号”但重命名为“临时” df.loc[df.signal=='signal',cols].assign(signal='temp')的行。然后,已排序的行将使用“信号”和cumsum()进行分组标记。参见下面的代码:

str="""No Open High Low Close signal 
1   75   95   65  50    signal 
2   78   94   74  77    none 
3   83   91   81  84    none 
4   91   101  88  93    signal 
5   104  121  95  103   none 
6   101  111  99  105   none 
7   97   108  95  101   signal 
8   103  113  102 106   none 
9   108  128  105 114   signal 
10  104  114  99  102   none 
11  110  130  105 115   signal 
12  112  122  110 115   none 
13  118  145  112 123   none 
14  123  143  71  133   signal 
15  130  150  120 140   none"""

df = pd.read_csv(pd.io.common.StringIO(str), sep='\s+')

# cols which are used in this task 
cols = ['No', 'Low', 'signal']

# create a new dataframe, cloned all 'signal' rows but rename signal to 'temp', sort the rows
df1 = pd.concat([df[cols], df.loc[df.signal=='signal',cols].assign(signal='temp')]) \
        .sort_values(['No', 'signal'],ascending=[1,0])

# set up group-number with cumsum() and get min() value from each group
df1['g'] = (df1.signal == 'signal').cumsum()
# the following field just for reference, no need for calculation
df1['Low_min'] = df1.groupby('g').Low.transform('min')

新的数据帧df1如下所示。除了第一个和最后一个组,每个组现在都以“信号”开始,并以“临时”(也为“信号”)结束:

enter image description here

根据您的描述,对于第9行(黄色背景,df1.g == 4中的第一项),我们可以检查df1.loc[df1.g==3, "Low_min"](红色边框) df1.loc[df1.g==1, "Low_min"](绿色边框)

如果我们具有以下条件:

s = df1.groupby('g').Low.min()

购买组列表应满足s.shift(1)> s.shift(3)

buy = s[s.shift(1) > s.shift(3)].index.tolist()

因此,让我们设置条件:

# m1: row marked with signal
# skip the first 3 groups which do not have enough signals
m1 = df1.signal.eq('signal') & df1.g.gt(3)

# m2: m1 plus must in buy list
m2 = df1.g.isin(buy) & m1
df1['trade'] = np.select([m2, m1], ['Buy', 'None'], '')
#In [36]: df1
#Out[36]: 
#    No  Low  signal  g  Low_min trade
#0    1   65    temp  0       65      
#0    1   65  signal  1       65      
#1    2   74    none  1       65      
#2    3   81    none  1       65      
#3    4   88    temp  1       65      
#3    4   88  signal  2       88      
#4    5   95    none  2       88      
#5    6   99    none  2       88      
#6    7   95    temp  2       88      
#6    7   95  signal  3       95      
#7    8  102    none  3       95      
#8    9  105    temp  3       95      
#8    9  105  signal  4       99   Buy
#9   10   99    none  4       99      
#10  11  105    temp  4       99      
#10  11  105  signal  5       71   Buy
#11  12  110    none  5       71      
#12  13  112    none  5       71      
#13  14   71    temp  5       71      
#13  14   71  signal  6       71  None
#14  15  120    none  6       71      

拥有df1.trade之后,我们可以更新原始数据框:

# set up column `trade` with EMPTY as default and update 
# the field based on df1.trade (using the index)
df['trade'] = ''
df.trade.update(df1.loc[df1.signal=='signal',"trade"])