我想计算我的变量高于50的连续天数。 我的数据集如下:
dp <- dput(head(df, 20))
dp = structure(list(day = 1:20, month = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), year = c(1990L,
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1990L), variable = c(46.8, 51.3, 51.2, 51.9, 51.4, 50.9, 51.4,
51.6, 51.5, 49.9, 49.4, 49.1, 51.7, 51.8, 50.9, 51, 51.9, 52.5,
52.5, 49.1)), .Names = c("day", "month", "year", "variable"), row.names = c(NA,
20L), class = "data.frame")
提前多多感谢
答案 0 :(得分:3)
您可以使用rle
及其反函数。我在这里使用data.table来实现简单的分组功能:
fun <- function(x, lim) {
y <- x > lim
z <- rle(y)
z$values[-which.max(z$lengths)] <- FALSE
inverse.rle(z)
}
library(data.table)
setDT(dp)
dp[, {
ind <- fun(variable, 50)
list(count = sum(ind), start_day = day[ind][1], end_day = tail(day[ind], 1))
}, by = .(month, year)]
# month year count start_day end_day
#1: 1 1990 8 2 9
显然,您的示例数据全部来自同一个月。