我创建了一个pandas df,我从yahoo带来了股票数据。我添加了百分比更改列,并过滤了百分比变化> 0.02。没问题。 现在我想添加一个额外的选择参数,输出一个df,我可以查看前一个条件为True的日期(pct_change> 0.02)和查看日期前10天和日期后10天将条件(pct_change> 0.02)评估为True。 我真的无法理解如何开始。任何帮助,将不胜感激。到目前为止我的代码:
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 1, 27)
gspc2 = web.DataReader("^GSPC", 'yahoo', start, end)
gspc2.rename(columns={'Adj Close' :'Adj_Close'}, inplace=True) gspc2['pct_change'] = gspc2['Adj_Close'].pct_change()
gspc2 = gspc2.ix[(gspc2['pct_change'] > 0.0200)]
答案 0 :(得分:0)
一个想法是:
这是一个例子,希望它有所帮助
import pandas as pd
import numpy as np
data = { 'a' : range(10, 24) }
df = pd.DataFrame(data)
df['b'] = (df.a % 5 == 0) # marks row 0, and 5
# number to look back and forward
n = 1
# find the rows meet cretiria, row 0 and 5
rows = np.where(df.b)[0]
# expand
rows = [x for row in rows for x in range(row-n, row+n+1) if x>= 0]
# filter
rows = list(set(rows))
print df.loc[rows]
输出结果为:
a b
0 10 True
1 11 False
4 14 False
5 15 True
6 16 False
9 19 False
10 20 True
11 21 False
答案 1 :(得分:0)
我使用了Xin Huang的代码作为基础
import pandas_datareader.data as web
import datetime
import itertools
# bringing stock data
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 3, 27)
gspc2 = web.DataReader("UNG", 'yahoo', start, end)
gspc2.rename(columns={'Adj Close' :'Adj_Close'}, inplace=True)
gspc2['pct_change'] = gspc2['Adj_Close'].pct_change()
# gspc2['std_dev2'] = gspc2['pct_change'].std()*2
# gspc2['pct_change_mean'] = gspc2['pct_change'].mean()
# setting filter condition
condition = -0.07
row_filter, gspc2['row_filter'] = gspc2.index[gspc2['pct_change'] <= condition ], (gspc2['pct_change'] <= condition)
# window of days before and after the selected date
n = 3
selected_rows = [(pd.date_range(i - pd.DateOffset(days=n), periods=n*2+1)) for i in row_filter]
selected_rows = list(itertools.chain.from_iterable(selected_rows))
# cumulative return n-2 days later after the day on which condition occured, without counting return on the day itself
gspc2['cum_pct_change_ndays_after'] = gspc2.Adj_Close.shift(-(n-2))/gspc2.Adj_Close - 1
gspc2['n_days_avg_return'] = gspc2.cum_pct_change_ndays_after.mean()
final_df = gspc2.loc[selected_rows].dropna().drop_duplicates().sort_index(ascending=False)
#print(row_filter)
# removing nan due to mismatch in market days vs calendar days and removing duplicates
print(final_df)
print(final_df[final_df.row_filter])