Pandas时间序列日期范围基于列中的值进行切片

时间:2017-02-17 18:47:49

标签: python pandas time-series stock

我创建了一个pandas df,我从yahoo带来了股票数据。我添加了百分比更改列,并过滤了百分比变化> 0.02。没问题。 现在我想添加一个额外的选择参数,输出一个df,我可以查看前一个条件为True的日期(pct_change> 0.02)查看日期前10天和日期后10天将条件(pct_change> 0.02)评估为True。 我真的无法理解如何开始。任何帮助,将不胜感激。到目前为止我的代码:

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 1, 27)
gspc2 = web.DataReader("^GSPC", 'yahoo', start, end)
gspc2.rename(columns={'Adj Close' :'Adj_Close'}, inplace=True) gspc2['pct_change'] = gspc2['Adj_Close'].pct_change()
gspc2 = gspc2.ix[(gspc2['pct_change'] > 0.0200)]

2 个答案:

答案 0 :(得分:0)

一个想法是:

  1. 获取符合您的标准的行索引
  2. 根据您的范围扩展行索引
  3. 过滤掉重复的内容
  4. 这是一个例子,希望它有所帮助

    import pandas as pd
    import numpy as np
    
    data = { 'a' : range(10, 24) }
    df = pd.DataFrame(data)
    df['b'] = (df.a % 5 == 0) # marks row 0, and 5
    
    # number to look back and forward
    n = 1
    
    # find the rows meet cretiria, row 0 and 5
    rows = np.where(df.b)[0]
    
    # expand
    rows = [x for row in rows for x in range(row-n, row+n+1) if x>= 0]
    
    # filter
    rows = list(set(rows))
    
    print df.loc[rows]
    

    输出结果为:

         a      b
    0   10   True
    1   11  False
    4   14  False
    5   15   True
    6   16  False
    9   19  False
    10  20   True
    11  21  False
    

答案 1 :(得分:0)

我使用了Xin Huang的代码作为基础

import pandas_datareader.data as web
import datetime
import itertools

# bringing stock data
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 3, 27)
gspc2 = web.DataReader("UNG", 'yahoo', start, end)
gspc2.rename(columns={'Adj Close' :'Adj_Close'}, inplace=True)
gspc2['pct_change'] = gspc2['Adj_Close'].pct_change()


# gspc2['std_dev2'] = gspc2['pct_change'].std()*2
# gspc2['pct_change_mean'] = gspc2['pct_change'].mean()

# setting filter condition
condition = -0.07
row_filter, gspc2['row_filter'] = gspc2.index[gspc2['pct_change'] <= condition ], (gspc2['pct_change'] <= condition)

# window of days before and after the selected date
n = 3

selected_rows = [(pd.date_range(i - pd.DateOffset(days=n), periods=n*2+1)) for i in row_filter]
selected_rows = list(itertools.chain.from_iterable(selected_rows))

# cumulative return n-2 days later after the day on which condition occured, without counting return on the day itself
gspc2['cum_pct_change_ndays_after'] = gspc2.Adj_Close.shift(-(n-2))/gspc2.Adj_Close - 1
gspc2['n_days_avg_return'] = gspc2.cum_pct_change_ndays_after.mean()

final_df = gspc2.loc[selected_rows].dropna().drop_duplicates().sort_index(ascending=False)



#print(row_filter)
# removing nan due to mismatch in market days vs calendar days and removing duplicates 

print(final_df)
print(final_df[final_df.row_filter])