我从一堆服务器上获得了30秒的粒度数据。我想将每个服务器的数据滚动到15分钟。
我的数据框是这样的:
dput(p)的
structure(list(DATE = c("2013-04-15 02:47:32", "2013-04-15 02:48:02",
"2013-04-15 02:48:32", "2013-04-15 02:49:02", "2013-04-15 02:49:32",
"2013-04-15 02:50:02", "2013-04-15 02:50:32", "2013-04-15 02:51:02",
"2013-04-15 02:51:32", "2013-04-15 02:52:02", "2013-04-15 02:52:32",
"2013-04-15 02:53:02", "2013-04-15 02:53:32", "2013-04-15 02:54:02",
"2013-04-15 02:54:32", "2013-04-15 02:55:02", "2013-04-15 02:55:32",
"2013-04-15 02:56:02", "2013-04-15 02:56:32", "2013-04-15 02:57:02",
"2013-04-29 17:33:07", "2013-04-29 17:33:37", "2013-04-29 17:34:07",
"2013-04-29 17:34:37", "2013-04-29 17:35:07", "2013-04-29 17:35:37",
"2013-04-29 17:36:07", "2013-04-29 17:36:37", "2013-04-29 17:37:07",
"2013-04-29 17:37:37", "2013-04-29 17:38:07", "2013-04-29 17:38:37",
"2013-04-29 17:39:07", "2013-04-29 17:39:37", "2013-04-29 17:40:07",
"2013-04-29 17:40:37", "2013-04-29 17:41:07", "2013-04-29 17:41:37",
"2013-04-29 17:42:07", "2013-04-29 17:42:37"), Server = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ServerA", "ServerB"), class = "factor"),
CPU = c(70L, 71L, 72L, 72L, 72L, 73L, 73L, 74L, 73L, 73L,
73L, 73L, 71L, 74L, 72L, 72L, 70L, 72L, 71L, 70L, 78L, 79L,
79L, 78L, 79L, 77L, 78L, 80L, 81L, 80L, 80L, 79L, 79L, 79L,
81L, 79L, 78L, 79L, 79L, 79L)), .Names = c("DATE", "Server",
"CPU"), class = "data.frame", row.names = c(NA, -40L))
是否有一种简单的方法可以将每个服务器的30秒数据滚动到15分钟的数据?我可以在此数据框中拥有2台以上的服务器。
例如,如果我的数据如下,其中包括30秒数据。我需要每隔15分钟就能获得一次CPU数据。
DATE SERVER CPU
1 2013-04-15 02:47:32 ServerA 70
2 2013-04-15 02:48:02 ServerA 71
3 2013-04-15 02:48:32 ServerA 72
4 2013-04-15 02:49:02 ServerA 72
5 2013-04-15 02:49:32 ServerA 72
6 2013-04-15 02:50:02 ServerA 73
:
:
:
:
答案 0 :(得分:3)
首先,将你的sring投射到课堂 POSIXct :
as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S"))
接下来,取消它以获得纪元(自1970-01-01以来的秒数):
unclass(as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S")))
最后,截断超过最后15分钟间隔(15 * 60秒)的秒数:
floor(unclass(as.POSIXct(strptime("2013-04-15 02:47:32",
"%Y-%m-%d %H:%M:%S"))
) / (15*60)
) * (15*60)
数据框上的所有内容:
as.POSIXct(floor(unclass(as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S")))/(15*60))*(15*60), origin='1970-01-01 00:00.00 UTC')
答案 1 :(得分:0)
我会做什么:
正如topchef建议的那样,使用POSIXct,而不是使用字符串。所以,一旦我存储了L
你的数据,我的结构看起来就像你所拥有的那样,但是我会得到ts,而不是你的DATE专栏,这是以topchef建议的方式获得的,
L$ts <- as.POSIXct(L$DATE)
您希望聚合值,因此在我看来,将聚合键添加到数据中非常自然。
baseSecond <- function(x, seconds) {
as.POSIXct(floor(unclass(x) / seconds) * seconds,
origin='1970-01-01 00:00.00 UTC')
}
L$base <- baseSecond(L$ts, 15*60)
要完成任务,我将使用aggregate
标准函数。
aggregate(L$Server, by=list(L$base), function(x) x[1])
第三个参数允许您选择聚合数据的方式。
答案 2 :(得分:0)
我提出了这样的解决方案,可能有更好更快的解决方案,但现在可以使用:
apply.periodly <- function (x, FUN, period, k=1, ...)
{
if (!require("xts")) {
stop("Need 'xts'")
}
ep <- endpoints(x, on=period, k=k)
period.apply(x, ep, FUN, ...)
}
total_df <- data.frame(DATE=as.POSIXct(character()), CPU=as.numeric(character()), SERVER=character())
for(i in 1:length(servers)) {
y<-subset(x, SERVER= c(servers[i]))
mydata.xts <- xts(y$CPU, order.by = y$DATE)
mydata.15M <- apply.periodly(x = mydata.xts, FUN = mean, period = "minutes", k = 15)
new_df<-data.frame(date=index(mydata.15M), coredata(mydata.15M))
colnames(new_df)<-c("DATE", "CPU")
new_df$SERVER<-as.character(servers[i])
total_df<-rbind(total_df, new_df)
}