我正在尝试计算其中一列中某些组的开头和结尾的中位数。为了更清楚,我将基于样本数据进行解释:
Time <- c("2015-08-21T10:00:51", "2015-08-21T10:02:51", "2015-08-21T10:04:51", "2015-08-21T10:06:51",
"2015-08-21T10:08:51", "2015-08-21T10:10:51","2015-08-21T10:12:51", "2015-08-21T10:14:51",
"2015-08-21T10:16:51", "2015-08-21T10:18:51", "2015-08-21T10:20:51", "2015-08-21T10:22:51")
x <- c(38.855, 38.664, 40.386, 40.386, 40.195, 40.386, 40.386, 40.195, 40.386, 38.855, 38.664, 40.386)
y <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b")
data <- data.frame(Time,x,y)
data$Time <- as.POSIXct(data$Time, format = "%Y-%m-%dT%H:%M:%S")
因此,在这种情况下, 2分钟时间的列x
的中位数("2015-08-21T10:00:51", "2015-08-21T10:02:51"
因此 x = 38.855,38.664 中位数 = 38.7595)并结束"2015-08-21T10:08:51", "2015-08-21T10:10:51"
因此 x = 40.195,40.386 中位数 = 40.2905)等级{{1} },对于开头的a
级b
,对于 x = 40.386,40.195 中位数 = 40.2905)并结束({{1}所以对于 x = 38.664,40.386 中位数 = 39.525)......
此计算的结果最好是作为新的"2015-08-21T10:10:51","2015-08-21T10:12:51"
获得:
"2015-08-21T10:20:51", "2015-08-21T10:22:51"
感谢您的帮助!
干杯
答案 0 :(得分:1)
使用库dplyr
和tidyr
,您可以执行以下操作:
data %>%
group_by(y) %>%
slice(c(1, 2, n(), n() - 1)) %>%
group_by(y) %>%
mutate(firstGroup = ifelse(row_number(y) < 3, 'medianGroup1', 'medianGroup2')) %>%
group_by(y, firstGroup) %>%
summarise(medianValue = median(x)) %>%
spread(firstGroup, medianValue)
输出如下:
Source: local data frame [2 x 3]
y medianGroup1 medianGroup2
(fctr) (dbl) (dbl)
1 a 38.7595 40.2905
2 b 40.2905 39.5250
注意,我在代码中明确显示每个步骤,但可以进一步压缩。