Question

我想计算我的变量高于50的连续天数。我的数据集如下：

dp <- dput(head(df, 20))

dp = structure(list(day = 1:20, month = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), year = c(1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L), variable = c(46.8, 51.3, 51.2, 51.9, 51.4, 50.9, 51.4, 
51.6, 51.5, 49.9, 49.4, 49.1, 51.7, 51.8, 50.9, 51, 51.9, 52.5, 
52.5, 49.1)), .Names = c("day", "month", "year", "variable"), row.names = c(NA, 
20L), class = "data.frame")

提前多多感谢

Answer 1

您可以使用rle及其反函数。我在这里使用data.table来实现简单的分组功能：

fun <- function(x, lim) {
  y <- x > lim
  z <- rle(y)
  z$values[-which.max(z$lengths)] <- FALSE
  inverse.rle(z)
} 

library(data.table)
setDT(dp)
dp[, {
  ind <- fun(variable, 50)
  list(count = sum(ind), start_day = day[ind][1], end_day = tail(day[ind], 1))
}, by = .(month, year)]
#   month year count start_day end_day
#1:     1 1990     8         2       9

显然，您的示例数据全部来自同一个月。

如何计算R

1 个答案: