如何将数据汇总30秒到R中的15分钟数据

时间:2013-04-30 21:41:06

标签: r

我从一堆服务器上获得了30秒的粒度数据。我想将每个服务器的数据滚动到15分钟。

我的数据框是这样的:

dput(p)的

structure(list(DATE = c("2013-04-15   02:47:32", "2013-04-15   02:48:02", 
"2013-04-15   02:48:32", "2013-04-15   02:49:02", "2013-04-15   02:49:32", 
"2013-04-15   02:50:02", "2013-04-15   02:50:32", "2013-04-15   02:51:02", 
"2013-04-15   02:51:32", "2013-04-15   02:52:02", "2013-04-15   02:52:32", 
"2013-04-15   02:53:02", "2013-04-15   02:53:32", "2013-04-15   02:54:02", 
"2013-04-15   02:54:32", "2013-04-15   02:55:02", "2013-04-15   02:55:32", 
"2013-04-15   02:56:02", "2013-04-15   02:56:32", "2013-04-15   02:57:02", 
"2013-04-29   17:33:07", "2013-04-29   17:33:37", "2013-04-29   17:34:07", 
"2013-04-29   17:34:37", "2013-04-29   17:35:07", "2013-04-29   17:35:37", 
"2013-04-29   17:36:07", "2013-04-29   17:36:37", "2013-04-29   17:37:07", 
"2013-04-29   17:37:37", "2013-04-29   17:38:07", "2013-04-29   17:38:37", 
"2013-04-29   17:39:07", "2013-04-29   17:39:37", "2013-04-29   17:40:07", 
"2013-04-29   17:40:37", "2013-04-29   17:41:07", "2013-04-29   17:41:37", 
"2013-04-29   17:42:07", "2013-04-29   17:42:37"), Server = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ServerA", "ServerB"), class = "factor"), 
    CPU = c(70L, 71L, 72L, 72L, 72L, 73L, 73L, 74L, 73L, 73L, 
    73L, 73L, 71L, 74L, 72L, 72L, 70L, 72L, 71L, 70L, 78L, 79L, 
    79L, 78L, 79L, 77L, 78L, 80L, 81L, 80L, 80L, 79L, 79L, 79L, 
    81L, 79L, 78L, 79L, 79L, 79L)), .Names = c("DATE", "Server", 
"CPU"), class = "data.frame", row.names = c(NA, -40L))

是否有一种简单的方法可以将每个服务器的30秒数据滚动到15分钟的数据?我可以在此数据框中拥有2台以上的服务器。

例如,如果我的数据如下,其中包括30秒数据。我需要每隔15分钟就能获得一次CPU数据。

      DATE       SERVER CPU
1 2013-04-15 02:47:32 ServerA 70
2 2013-04-15 02:48:02 ServerA 71
3 2013-04-15 02:48:32 ServerA 72
4 2013-04-15 02:49:02 ServerA 72
5 2013-04-15 02:49:32 ServerA 72
6 2013-04-15 02:50:02 ServerA 73
   :
   :
   :
   :

3 个答案:

答案 0 :(得分:3)

首先,将你的sring投射到课堂 POSIXct

as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S"))

接下来,取消它以获得纪元(自1970-01-01以来的秒数):

unclass(as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S")))

最后,截断超过最后15分钟间隔(15 * 60秒)的秒数:

floor(unclass(as.POSIXct(strptime("2013-04-15 02:47:32", 
                                  "%Y-%m-%d %H:%M:%S"))
             ) / (15*60)
     ) * (15*60)

数据框上的所有内容:

as.POSIXct(floor(unclass(as.POSIXct(strptime("2013-04-15   02:47:32", "%Y-%m-%d %H:%M:%S")))/(15*60))*(15*60), origin='1970-01-01 00:00.00 UTC')

答案 1 :(得分:0)

我会做什么:

正如topchef建议的那样,使用POSIXct,而不是使用字符串。所以,一旦我存储了L你的数据,我的结构看起来就像你所拥有的那样,但是我会得到ts,而不是你的DATE专栏,这是以topchef建议的方式获得的,

L$ts <- as.POSIXct(L$DATE)

您希望聚合值,因此在我看来,将聚合键添加到数据中非常自然。

baseSecond <- function(x, seconds) { 
  as.POSIXct(floor(unclass(x) / seconds) * seconds,
             origin='1970-01-01 00:00.00 UTC')
}

L$base <- baseSecond(L$ts, 15*60)

要完成任务,我将使用aggregate标准函数。

aggregate(L$Server, by=list(L$base), function(x) x[1])

第三个参数允许您选择聚合数据的方式。

答案 2 :(得分:0)

我提出了这样的解决方案,可能有更好更快的解决方案,但现在可以使用:

apply.periodly <- function (x, FUN, period, k=1, ...) 
{
  if (!require("xts")) {
    stop("Need 'xts'")
  }
  ep <- endpoints(x, on=period, k=k)
  period.apply(x, ep, FUN, ...)
}

total_df <- data.frame(DATE=as.POSIXct(character()), CPU=as.numeric(character()),  SERVER=character())


for(i in 1:length(servers)) {

    y<-subset(x, SERVER= c(servers[i]))
    mydata.xts <- xts(y$CPU, order.by = y$DATE)
    mydata.15M <- apply.periodly(x = mydata.xts, FUN = mean, period = "minutes", k = 15)

    new_df<-data.frame(date=index(mydata.15M), coredata(mydata.15M))
    colnames(new_df)<-c("DATE", "CPU")
    new_df$SERVER<-as.character(servers[i])

    total_df<-rbind(total_df, new_df)    

}