按组连续数月计算

时间:2018-01-24 03:17:33

标签: r

我们说我有以下数据。

structure(list(year_month = c("2016-10", "2016-11", "2016-12", 
"2017-01", "2017-02", "2017-05", "2017-08", "2017-09", "2016-10", 
"2016-11", "2016-12", "2017-01"), site_owner = c("Adam", 
"Adam", "Adam", "Adam", "Adam", "Adam", 
"Allison", "Allison", "Allison", "Allison", 
"Allison", "Allison"), N = c(4L, 10L, 4L, 11L, 8L, 
15L, 8L, 7L, 2L, 5L, 6L, 2L)), .Names = c("year_month", "site_owner", 
"N"), row.names = c(NA, -12L), class = c("data.table", "data.frame"
))

我想找到每组/每个人连续月数。

要获得所需的输出,我需要找到当前上个月和上个月之间的差异。

ddf$year_month = as.Date(paste(ddf$year_month, "01", sep="-"))
ddf
ddf[, diffa := year_month-shift(year_month), .(site_owner)]
ddf
ddf[, diffs := (year_month-shift(year_month))/(365.25/12), .(site_owner)]
ddf

这似乎不起作用。

如果我能找到差异,那么我可以通过这样做来获得计数。

dt[diffa==1, .N, by=.(site_owner)]

这是所需的输出。

name      conecutive months
adam      5 
allison   6

1 个答案:

答案 0 :(得分:1)

可能会有所帮助

library(zoo)
library(data.table)
setDT(ddf)[, {
     v1 <- zoo::as.yearmon(year_month)
    .(consecutive_months = sum((v1 +1/12) == shift(v1, type = "lead"), na.rm = TRUE))}, 
       by = site_owner]
#   site_owner consecutive_months
#1:       Adam                  4
#2:    Allison                  5

注意:假设&#39; year_month&#39;对于行9:12,'2017-10', '2017-11', '2017-12', '2018-01'

数据