如果熊猫条件如何使用?

时间:2017-05-10 20:19:32

标签: pandas

我正在研究熊猫,我有四栏

Name    Sensex_index    Start_Date       End_Date
AAA        0.5           20/08/2016    25/09/2016 
AAA        0.8           26/08/2016    29/08/2016 
AAA        0.4           30/08/2016    31/08/2016
AAA        0.9           01/09/2016    05/09/2016
AAA        0.5           12/09/2016    22/09/2016
AAA        0.3           24/09/2016    29/09/2016
ABC        0.9           01/01/2017    15/01/2017
ABC        0.5           23/01/2017    30/01/2017
ABC        0.7           02/02/2017    15/03/2017

如果同名的sensex索引从较低的索引增加并移动到较高的索引,则终止日期是之前的值,例如,我正在寻找以下输出,

Name   Sensex_index  Actual_Start      Termination_Date 
AAA        0.5        20/08/2016          31/08/2016
AAA        0.8        20/08/2016          31/08/2016
AAA        0.4        20/08/2016          31/08/2016 [high to low; low to high,terminate]
AAA        0.9        01/09/2016          29/09/2016
AAA        0.5        01/09/2016          29/09/2016      
AAA        0.3        01/09/2016          29/09/2016 [end of AAA]
ABC        0.9        01/01/2017          30/01/2017  
ABC        0.5        01/01/2017          30/01/2017 [high to low; low to high,terminate]
ABC        0.7        02/02/2017          15/03/2017 [end of ABC]

1 个答案:

答案 0 :(得分:0)

#Setup
df = pd.DataFrame(data = [['AAA', 0.5, '20/08/2016', '25/09/2016'],
 ['AAA', 0.8, '26/08/2016', '29/08/2016'],
 ['AAA', 0.4, '30/08/2016', '31/08/2016'],
 ['AAA', 0.9, '01/09/2016', '05/09/2016'],
 ['AAA', 0.5, '12/09/2016', '22/09/2016'],
 ['AAA', 0.3, '24/09/2016', '29/09/2016'],
 ['ABC', 0.9, '01/01/2017', '15/01/2017'],
 ['ABC', 0.5, '23/01/2017', '30/01/2017'],
 ['ABC', 0.7, '02/02/2017', '15/03/2017']], columns = ['Name', 'Sensex_index', 'Start_Date', 'End_Date'])

#Find the rows where price change from high to low and then to high
df['change'] = df.groupby('Name')['Sensex_index'].apply(lambda x: x.rolling(3,center=True).apply(lambda y: True if (y[1]<y[0] and y[1]<y[2]) else False))
#Find the last row for each name
df.iloc[df.groupby('Name')['change'].tail(1).index, -1] = 1.0        
#Set End_Date as Termination_Date for those changing points
df['Termination_Date'] = df.apply(lambda x: x.End_Date if x.change>0 else np.nan, axis=1)
#Set Actual_Start
df['Actual_Start'] = df.apply(lambda x: x.Start_Date if (x.name==0 
                                                          or x.Name!= df.iloc[x.name-1]['Name'] 
                                                          or df.iloc[x.name-1]['change']>0) 
                                                     else np.nan, axis=1)
#back fill the Termination_Date for other rows.
df.Termination_Date.fillna(method='bfill', inplace=True)
#forward fill the Actual_Start for other rows.
df.Actual_Start.fillna(method='ffill', inplace=True)
print(df)