我需要在我的刻度数据集上进行前一个刻度聚合,间隔为5分钟。请注意我想要做的是类似于高频包中的aggregateTrades()函数。但由于一些其他数据处理问题,我需要在不使用高频包的情况下解决此问题。这是我的数据集:
dput(tt)
structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95,
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05,
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA,
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378,
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9,
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05,
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231,
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45,
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85,
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25,
231.15, 231, 231, 231, 231.05, NA), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459481850,
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306,
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314,
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320,
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328,
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337,
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347,
1459482348, 1459482349, 1459482350, 1459482351, 1459482354, 1459482355,
1459482356, 1459482357, 1459482358, 1459482359, 1459482362, 1459482363,
1459482364, 1459482369, 1459482370, 1459482371, 1459482372, 1459482373,
1459482378, 1459482379, 1459482380, 1459482382, 1459482388), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), .Dim = c(60L, 2L), .Dimnames = list(NULL, c("A",
"B")), class = c("xts", "zoo"))
这是我以前的刻度汇总代码:
ag.5min.tt<-tt%>%filter(as.Date(index(tt)))%>%lapply(aggregate(by=cut(format(index(tt), format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))
我想用上面的代码做的是为A和B的价格每隔5分钟 但是我收到了错误。请建议如何解决此错误:
Error in UseMethod("filter_") :
no applicable method for 'filter_' applied to an object of class "c('xts', 'zoo')"
感谢。
修改 将xts对象转换为dataframe:
tt<-as.data.frame(tt)
tt<-data.frame(Time=rownames(tt), coredata(tt))
ag.5min.tt<-tt%>% filter(as.Date(index(tt)))%>%lapply(aggregate(by=cut(format(index(tt), format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))
新错误:
Error in eval(substitute(expr), envir, enclos) :
filter condition does not evaluate to a logical vector.
编辑: 尝试:
tt$Time<- as.POSIXct(tt$Time, format="%Y-%m-%d %H:%M:%S")
ag.5min.tt<-tt%>% group_by(Time==as.Date(tt$Time))%>%lapply(aggregate(by=cut(format(tt$Time, format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))
错误:
Error in cut.default(format(tt$Time, format = "%H:%M:%S"), breaks = "5 mins", :
'x' must be numeric
In addition: Warning message:
In eval(substitute(expr), envir, enclos) :
Incompatible methods ("Ops.POSIXt", "Ops.Date") for "=="
结果看起来像这样。每五分钟时间戳将具有该特定时间戳的值,或者如果存在NA,则该时间戳将具有库存A和B的最后非NA值
time A B
1 2016-04-01 09:00:00 NA NA
2 2016-04-01 09:05:00 NA NA
3 2016-04-01 09:10:00 NA NA
4 2016-04-01 09:15:00 1371.25 231.90
5 2016-04-01 09:20:00 1376.35 231.55
答案 0 :(得分:2)
您可以使用.indexmin
将时间序列索引为分钟,然后将该索引操作为子集观察值:
ind <- which(diff(.indexmin(tt) %% 5) == -4)
res <- tt[ind]
此处,.indexmin(tt) %% 5
将返回自上一个五分钟以来的分钟数。出于我们的目的,我们希望从中提取每个4
系列的最后一个索引,这是每五分钟之前一分钟的最后一次观察。要做到这一点,我们可以使用diff
并只提取从4
到0
的交叉索引(导致diff
-4
})使用which
。
为了说明,我们修改您发布的数据以添加实际满足您的提取条件的观察结果:
tt <- structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95,
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05,
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA,
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378,
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9,
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05,
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231,
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45,
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85,
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25,
231.15, 231, 231, 231, 231.05, NA), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459482299,
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306,
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314,
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320,
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328,
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337,
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347,
1459482348, 1459482349, 1459482590, 1459482591, 1459482594, 1459482595,
1459482596, 1459482597, 1459482598, 1459482599, 1459482602, 1459482603,
1459482604, 1459482609, 1459482610, 1459482611, 1459482612, 1459482613,
1459482618, 1459482619, 1459482620, 1459482622, 1459482628), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), .Dim = c(60L, 2L), .Dimnames = list(NULL,c("A",
"B")), class = c("xts", "zoo"))
## A B
##2016-04-01 09:14:59 1371.25 231.90
##2016-04-01 09:15:01 NA 231.85
##2016-04-01 09:15:02 1373.95 NA
##2016-04-01 09:15:03 NA 231.90
##2016-04-01 09:15:04 NA 231.85
##2016-04-01 09:15:05 1373.00 231.90
##2016-04-01 09:15:06 NA 231.80
##2016-04-01 09:15:07 1373.95 231.90
##2016-04-01 09:15:09 1373.90 232.60
##2016-04-01 09:15:10 NA 231.95
##2016-04-01 09:15:11 NA 232.35
##2016-04-01 09:15:12 1374.00 232.00
##2016-04-01 09:15:14 1374.15 232.10
##2016-04-01 09:15:15 NA 232.05
##2016-04-01 09:15:16 1374.00 232.05
##2016-04-01 09:15:17 1373.85 232.05
##2016-04-01 09:15:18 1372.55 231.50
##2016-04-01 09:15:19 1374.05 231.30
##2016-04-01 09:15:20 1374.15 NA
##2016-04-01 09:15:21 1374.75 NA
##2016-04-01 09:15:22 NA 231.10
##2016-04-01 09:15:23 NA 231.10
##2016-04-01 09:15:24 1375.90 231.10
##2016-04-01 09:15:26 1374.05 231.00
##2016-04-01 09:15:28 NA 231.00
##2016-04-01 09:15:29 NA 230.95
##2016-04-01 09:15:30 NA 230.60
##2016-04-01 09:15:31 NA 230.60
##2016-04-01 09:15:32 NA 230.70
##2016-04-01 09:15:36 NA 230.60
##2016-04-01 09:15:37 NA 231.00
##2016-04-01 09:15:38 1375.00 NA
##2016-04-01 09:15:39 NA 231.00
##2016-04-01 09:15:42 NA 231.00
##2016-04-01 09:15:44 NA 231.45
##2016-04-01 09:15:46 NA 231.65
##2016-04-01 09:15:47 NA 231.40
##2016-04-01 09:15:48 1376.35 231.70
##2016-04-01 09:15:49 NA 231.30
##2016-04-01 09:19:50 NA 231.25
##2016-04-01 09:19:51 NA 231.25
##2016-04-01 09:19:54 NA 231.40
##2016-04-01 09:19:55 NA 231.40
##2016-04-01 09:19:56 NA 231.85
##2016-04-01 09:19:57 NA 231.75
##2016-04-01 09:19:58 NA 231.50
##2016-04-01 09:19:59 NA 231.55
##2016-04-01 09:20:02 NA 231.35
##2016-04-01 09:20:03 1376.25 NA
##2016-04-01 09:20:04 NA 231.50
##2016-04-01 09:20:09 1378.00 231.50
##2016-04-01 09:20:10 1376.50 NA
##2016-04-01 09:20:11 NA 231.50
##2016-04-01 09:20:12 NA 231.25
##2016-04-01 09:20:13 NA 231.15
##2016-04-01 09:20:18 1378.00 231.00
##2016-04-01 09:20:19 1378.00 231.00
##2016-04-01 09:20:20 NA 231.00
##2016-04-01 09:20:22 NA 231.05
##2016-04-01 09:20:28 1378.80 NA
有了这些数据,我们得到:
print(res)
## A B
##2016-04-01 09:14:59 1371.25 231.90
##2016-04-01 09:19:59 NA 231.55
要获得您发布的输出,您需要首先生成一个时间序列,其中包含您想要的每5分钟刻度数据(设置为NA
)。对于此示例,此时间序列(仅09:00
上从09:20
到2016-04-01
的5分钟刻度)可以是:
every.5.min <- structure(c(NA, NA, NA, NA, NA), .Dim = c(5L, 1L), .Dimnames = list(
NULL, "Empty"), index = structure(c(1459481400, 1459481700,
1459482000, 1459482300, 1459482600), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta")
## Empty
##2016-04-01 09:00:00 NA
##2016-04-01 09:05:00 NA
##2016-04-01 09:10:00 NA
##2016-04-01 09:15:00 NA
##2016-04-01 09:20:00 NA
然后,merge
与tt
:
tt <- merge(tt, every.5.min, all=TRUE)[,1:ncol(tt)]
如果all=TRUE
中不存在该行(即每5分钟),则tt
会将原始NA
中的行填充为tt
。请注意,合并后,我们只保留原始tt
中的列。
然后,在tt
之后,使用之前的值填充所有NA
:
res <- do.call(merge, lapply(tt, na.locf))
最后,使用.indexmin
和.indexsec
每隔5分钟只提取一行:
res <- res[.indexmin(res) %% 5 == 0 & .indexsec(res) == 0]
## A B
##2016-04-01 09:00:00 NA NA
##2016-04-01 09:05:00 NA NA
##2016-04-01 09:10:00 NA NA
##2016-04-01 09:15:00 1371.25 231.90
##2016-04-01 09:20:00 1376.35 231.55
答案 1 :(得分:1)
将xts对象更改为数据框以使用dplyr
library(dplyr)
library(tibble)
library(xts)
library(tidyr)
dtf <- tt %>%
as.data.frame() %>%
# add time information
rownames_to_column("time") %>%
mutate(time = as.POSIXct(time))
生成要挑选的时间向量 最小和最大时间之间每5分钟(300秒)
timepick <- seq(trunc(min(dtf$time),"hour"), # start at the hour
max(dtf$time)+300 , 300)
使用中断的向量来选择最后一个 观察每5分钟的时间间隔。
ag.5min.tt <- dtf %>%
# Add missing interval
full_join(data_frame(time = timepick), by = "time") %>%
arrange(time) %>% # important to arrange by time here
# Replace each NA with the most recent non-NA
fill(-time) %>%
# take selected values only
filter(time %in% timepick)
转换回xts对象
ag.5min.tt <- ag.5min.tt %>%
as.data.frame() %>%
column_to_rownames("time") %>%
as.xts()
ag.5min.tt
A B
2016-04-01 09:00:00 NA NA
2016-04-01 09:05:00 NA NA
2016-04-01 09:10:00 1371.25 231.90
2016-04-01 09:15:00 1371.25 231.90
2016-04-01 09:20:00 1378.80 231.05