我有一个如下所示的pandas数据帧时间序列(大约1000行和下面的四列):
Date Values Avg +1 Stdev
01/01/2010 1.01 1.00 1.05
02/01/2010 1.02 1.00 1.05
03/01/2010 1.04 1.00 1.05
04/01/2010 -0.97 1.00 1.05
05/01/2010 1.12 1.00 1.05
06/01/2010 1.08 1.00 1.05
....
我想做的是创建第五列(称为“触发日期”),如果第2列中的值超出第4列中设置的阈值,则新列将返回日期(来自索引列) ),否则不返回任何值。 此处的附加约束是,如果先前的值已超过第4列中的阈值,则第五列也不应返回日期。
换句话说,问题的伪代码为:
If df['Values'] > df['+1 Stdev']
AND
If df['Values'] (for the row above) < df['+1 Stdev']
THEN
Return df['Date'] in new column df['Trigger Date']
ELSE
Leave row in df['Trigger Date'] blank
在解决此问题方面的任何帮助将不胜感激
编辑:附加问题-以任何方式添加第三个约束,如果过去XX天(例如过去30天内)已经发生触发日期,则不返回任何触发日期?因此预期如下所示:
Date Values Avg +1 Stdev Trigger Date
0 01/01/2010 1.01 1.0 1.05 NaN
1 02/01/2010 1.02 1.0 1.05 NaN
2 03/01/2010 1.04 1.0 1.05 NaN
3 04/01/2010 -0.97 1.0 1.05 NaN
4 05/01/2010 1.12 1.0 1.05 05/01/2010
5 06/01/2010 1.08 1.0 1.05 NaN
6 07/01/2010 1.03 1.0 1.05 NaN
7 08/01/2010 1.07 1.0 1.05 NaN <- above threshold, but trigger occurred within last 30 days so don't return date
...
50 20/02/2010 1.12 1.0 1.05 20/02/2010 <- more than 30 days later, no trigger dates in between, so return date
答案 0 :(得分:0)
对行上方的值使用numpy.where
和shift
:
m1 = df['Values'] > df['+1 Stdev']
m2 = df['Values'].shift() < df['+1 Stdev']
df['Trigger Date'] = np.where(m1 & m2, df['Date'], np.nan)
print (df)
Date Values Avg +1 Stdev Trigger Date
0 01/01/2010 1.01 1.0 1.05 NaN
1 02/01/2010 1.02 1.0 1.05 NaN
2 03/01/2010 1.04 1.0 1.05 NaN
3 04/01/2010 -0.97 1.0 1.05 NaN
4 05/01/2010 1.12 1.0 1.05 05/01/2010
5 06/01/2010 1.08 1.0 1.05 NaN
编辑:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
m1 = df['Values'] > df['+1 Stdev']
m2 = df['Values'].shift() < df['+1 Stdev']
a = df['Date'] - pd.Timedelta(30, unit='d')
L = [df['Date'].shift(-1).isin(pd.date_range(x, y, freq='d')) for x, y in zip(a, df['Date'] )]
m3 = np.logical_or.reduce(L)
mask = (m1 & m2) | ~m3
df.loc[mask, 'Trigger Date'] = df['Date']
print (df)
Date Values Avg +1 Stdev Trigger Date
0 2010-01-01 1.01 1.0 1.05 NaT
1 2010-01-02 1.02 1.0 1.05 NaT
2 2010-01-03 1.04 1.0 1.05 NaT
3 2010-01-04 -0.97 1.0 1.05 NaT
4 2010-01-05 1.12 1.0 1.05 2010-01-05
5 2010-01-06 1.08 1.0 1.05 NaT
6 2010-02-20 1.12 1.0 1.05 2010-02-20