R时间序列:某些群体开始和结束时的几分钟中位数

时间:2016-02-11 08:20:10

标签: r

我正在尝试计算其中一列中某些组的开头和结尾的中位数。为了更清楚,我将基于样本数据进行解释:

Time <- c("2015-08-21T10:00:51", "2015-08-21T10:02:51", "2015-08-21T10:04:51", "2015-08-21T10:06:51", 
          "2015-08-21T10:08:51", "2015-08-21T10:10:51","2015-08-21T10:12:51", "2015-08-21T10:14:51", 
          "2015-08-21T10:16:51", "2015-08-21T10:18:51", "2015-08-21T10:20:51", "2015-08-21T10:22:51")
x <-  c(38.855, 38.664, 40.386, 40.386, 40.195, 40.386, 40.386, 40.195, 40.386, 38.855, 38.664, 40.386)
y <-  c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b")
data <- data.frame(Time,x,y)
data$Time <- as.POSIXct(data$Time, format = "%Y-%m-%dT%H:%M:%S")

因此,在这种情况下, 2分钟时间的列x的中位数("2015-08-21T10:00:51", "2015-08-21T10:02:51"因此 x = 38.855,38.664 中位数 = 38.7595)并结束"2015-08-21T10:08:51", "2015-08-21T10:10:51"因此 x = 40.195,40.386 中位数 = 40.2905)等级{{1} },对于开头的ab,对于 x = 40.386,40.195 中位数 = 40.2905)并结束({{1}所以对于 x = 38.664,40.386 中位数 = 39.525)......

此计算的结果最好是作为新的"2015-08-21T10:10:51","2015-08-21T10:12:51"获得:

"2015-08-21T10:20:51", "2015-08-21T10:22:51"

感谢您的帮助!

干杯

1 个答案:

答案 0 :(得分:1)

使用库dplyrtidyr,您可以执行以下操作:

data %>%
  group_by(y) %>%
  slice(c(1, 2, n(), n() - 1)) %>%
  group_by(y) %>%
  mutate(firstGroup = ifelse(row_number(y) < 3, 'medianGroup1', 'medianGroup2')) %>%
  group_by(y, firstGroup) %>%
  summarise(medianValue = median(x)) %>%
  spread(firstGroup, medianValue)

输出如下:

Source: local data frame [2 x 3]

       y medianGroup1 medianGroup2
  (fctr)        (dbl)        (dbl)
1      a      38.7595      40.2905
2      b      40.2905      39.5250

注意,我在代码中明确显示每个步骤,但可以进一步压缩。