这是我的数据
"5min_Ret"
我想为每个刻度创建另一列last 5 mins average of return
,其值应为 times value size return 5min_Ret Logic
2016-06-01 9:07:11 14.2 595 0 0 First Tick 0
2016-06-01 9:08:11 14.2 2505 0.003527341 0.001763671 Avg of 1 to 2
2016-06-01 9:11:03 14.15 1 0 0.00117578 Avg of 1 to 3
2016-06-01 9:13:03 14.15 2200 0.003527341 0.002351561 Avg of 2 to 4
2016-06-01 9:15:04 14.2 480 0 0.00117578 Avg of 3 to 5
2016-06-01 9:15:04 14.2 2965 0.003527341 0.001763671 Avg of 3 to 6
2016-06-01 9:15:05 14.2 144 0 0.001410936 Avg of 3 to 7
2016-06-01 9:20:05 14.2 1856 0.003514942 0.001757471 Avg of 7 to 8
2016-06-01 9:22:06 14.25 300 0 0.001757471 Avg of 8 to 9
2016-06-01 9:25:06 14.25 856 0.003514942 0.001757471 Avg of 9 to 10
。下面是每行末尾提到的计算逻辑的所需输出。逻辑专栏只是在这里解释。它不会添加到最终输出中。
dplyr
我认为This is source data0
[ 532.038 532.467 532.897 532.579 531.834 531.089 530.344 530.243
529.637 529.871 530.586 531.302 531.528 531.674 531.562 531.562]
This is the imfs for souce data0
[[ 4.99536300e-02 5.07521024e-01 1.15778456e+00 1.12993996e+00
7.67565359e-01 4.12133844e-01 -1.81761588e-02 1.82634342e-02
-5.76022792e-01 -5.16983337e-01 -8.86904761e-02 2.36815870e-01
1.38870440e-01 7.08367478e-02 -1.27149210e-01 -1.13787989e-01]
[ -4.58838235e-04 1.18438903e-01 1.53245692e-01 1.34404459e-01
7.60518794e-02 1.67176195e-02 -3.79650223e-02 -5.60086247e-02
-7.75462828e-02 -7.00926985e-02 -2.94792254e-02 3.22931827e-02
6.15527167e-02 5.16516550e-02 4.25997864e-03 -5.38057521e-02]
[ -1.13008493e-01 1.05889951e-01 1.65761000e-01 1.63480749e-01
6.48455348e-02 -9.18077666e-02 -2.36833140e-01 -2.97692545e-01
-2.79863120e-01 -1.55546830e-01 -1.07397933e-02 1.61763712e-01
2.56023595e-01 2.38445996e-01 9.00409154e-02 -1.86476311e-01]
[ nan nan nan nan
nan nan nan nan
nan nan nan nan
nan nan nan nan]]
Plotting IMF #1
Plotting IMF #2
Plotting IMF #3
Plotting Residual
This is source data1
[ 530.524 530.452 530.417 530.176 530.567 530.731 530.878 531.32
531.942 532.039 531.816 531.593 531.126 531.353 531.257 531.248]
This is the imfs for source data1
[[-0.06378673 -0.07530695 -0.04069713 -0.30207195 -0.02267617 -0.07398937
-0.21837115 -0.12946676 0.21435049 0.18605721 0.04908956 0.00394656
-0.26659788 0.08695065 0.04803377 0.02217659]
[ 0.03048818 0.01693255 -0.02122604 -0.06449743 -0.08466269 -0.0725593
-0.04595078 0.01500129 0.07128166 0.07859381 0.03046378 -0.04452977
-0.0963699 -0.09101547 -0.05157518 -0.003445 ]
[ 0.20185892 -0.00429606 -0.19287011 -0.27632151 -0.27612168 -0.19247013
-0.03727295 0.15981007 0.30781758 0.37327476 0.29858615 0.17610284
0.03574206 -0.06765531 -0.1129184 -0.07166027]
[ nan nan nan nan nan nan
nan nan nan nan nan nan
nan nan nan nan]]
Plotting IMF #1
Plotting IMF #2
Plotting IMF #3
Plotting Residual
包对group by非常有用。但是对于每个滴答,我无法成功按间隔5分钟获得数据分组。感谢R中的任何建议/帮助。
感谢。
答案 0 :(得分:2)
您可以使用sapply
实现此目的。我们假设您的对象名为df
:
df$'5min_ret' <- sapply( X = seq_along( df$return ),
FUN = function(x) {
mean( df$return[ df$times >= df$times[x] - 5*60 &
df$times <= df$times[x] ] )
} )
注意seq_along
调用只是创建一个与数据帧中行数相同的向量序列(在您的情况下为10)。
FUN
之后定义的函数非常重要。该函数采用数据帧的一个子集,其中时间在最后5分钟内(大于5分钟前,小于现在),并采用剩下的return
列的平均值。 sapply
只为X
的每个值运行该函数(这是我们的1:10序列)。
但请注意,调用列5min_ret
通常不是一个好主意,因为R并不特别喜欢该表单的名称。我已经在创作的引文中包围了它以解决这个问题,但我建议考虑一个不同的名字。
答案 1 :(得分:1)
df = data.frame(times = c("2016-06-01 9:07:11", "2016-06-01 9:08:11", "2016-06-01 9:11:03", "2016-06-01 9:13:03","2016-06-01 9:15:04 ","2016-06-01 9:15:04", "2016-06-01 9:15:05",
"2016-06-01 9:20:05", "2016-06-01 9:22:06", "2016-06-01 9:25:06"),
return = c( 0, 0.003527341, 0, 0.003527341, 0, 0.003527341, 0, 0.003514942, 0, 0.003514942))
df$times = as.POSIXct(df$times)
df
times return
1 2016-06-01 09:07:11 0.000000000
2 2016-06-01 09:08:11 0.003527341
3 2016-06-01 09:11:03 0.000000000
4 2016-06-01 09:13:03 0.003527341
5 2016-06-01 09:15:04 0.000000000
6 2016-06-01 09:15:04 0.003527341
7 2016-06-01 09:15:05 0.000000000
8 2016-06-01 09:20:05 0.003514942
9 2016-06-01 09:22:06 0.000000000
10 2016-06-01 09:25:06 0.003514942
# another dataframe for the start/end timeframe
df1 = data.frame("start" = df$times - 5*60, "end" = as.POSIXct(df$times))
df1
start end
1 2016-06-01 09:02:11 2016-06-01 09:07:11
2 2016-06-01 09:03:11 2016-06-01 09:08:11
3 2016-06-01 09:06:03 2016-06-01 09:11:03
4 2016-06-01 09:08:03 2016-06-01 09:13:03
5 2016-06-01 09:10:04 2016-06-01 09:15:04
6 2016-06-01 09:10:04 2016-06-01 09:15:04
7 2016-06-01 09:10:05 2016-06-01 09:15:05
8 2016-06-01 09:15:05 2016-06-01 09:20:05
9 2016-06-01 09:17:06 2016-06-01 09:22:06
10 2016-06-01 09:20:06 2016-06-01 09:25:06
library(dplyr)
df.mean <- df1 %>%
group_by(start, end) %>%
summarize(ret.mean = mean(df$return[df$times >= start & df$times <= end]))
df.mean
Source: local data frame [9 x 3]
Groups: start [?]
start end ret.mean
(time) (time) (dbl)
1 2016-06-01 09:02:11 2016-06-01 09:07:11 0.000000000
2 2016-06-01 09:03:11 2016-06-01 09:08:11 0.001763670
3 2016-06-01 09:06:03 2016-06-01 09:11:03 0.001175780
4 2016-06-01 09:08:03 2016-06-01 09:13:03 0.002351561
5 2016-06-01 09:10:04 2016-06-01 09:15:04 0.001763670
6 2016-06-01 09:10:05 2016-06-01 09:15:05 0.001410936
7 2016-06-01 09:15:05 2016-06-01 09:20:05 0.001757471
8 2016-06-01 09:17:06 2016-06-01 09:22:06 0.001757471
9 2016-06-01 09:20:06 2016-06-01 09:25:06 0.001757471
您会发现第5组和第6组已合并,因为它们具有相同的边界。我已经逐步完成了程序,以便您能够理解该方法。您可以稍后将它们全部放在一个数据框中