按日历间隔和ID分割时间段

时间:2014-03-02 13:35:26

标签: r date datetime split calendar

我有一个国家/地区基于事件组的数据集。它看起来像这样:

Data <- data.frame(EpStart=c("2010-01-01 00:00:00", "2009-01-01 00:00:00", "2009-01-01 00:00:00", "2006-01-01 00:00:00"), EpEnd=c("2011-01-01 00:03:00", "2013-01-01 00:00:00", "2012-01-01 00:00:00", "2011-01-01 00:00:00"), countryID=c("US","US", "CAN","CAN"))

我想创建一个数据框,将数据拆分为基于countryID分组的年度日历间隔。我需要将其转换为如下所示的数据框:

CountryID Year Ongoing
1         US 2009       1
2         US 2010       2
3         US 2011       1
4         US 2012       1
5        CAN 2006       1
6        CAN 2007       1
7        CAN 2008       1
8        CAN 2009       2
9        CAN 2010       2
10       CAN 2011       1

我已尝试使用@提供的here,示例,但是我找不到任何有关如何在拆分数据时保留CountryID的解决方案。

tmp <- do.call(c, apply(Data, 1, 
                        function(x) head(seq(from = as.POSIXct(x[1]), 
                                             to = as.POSIXct(x[2]),by = "years"), 
                                         -1)))

tmp <- sapply(split(tmp, format(tmp, format = "%Y")), length)

Ongoing <- data.frame(Date=names(tmp), Ongoing = tmp, row.names=NULL)

这会返回,但不会按CountryID分割数据:

> Ongoing
  Date Ongoing
1 2006       1
2 2007       1
3 2008       1
4 2009       3
5 2010       4
6 2011       2
7 2012       1

1 个答案:

答案 0 :(得分:0)

我想这样的事情看起来有效:

Data$Start = as.numeric(format(as.Date(Data$EpStart, "%Y-%m-%d"), "%Y"))
Data$End = as.numeric(format(as.Date(Data$EpEnd, "%Y-%m-%d"), "%Y"))
res = do.call(rbind,  
          lapply(split(Data, Data$countryID), 
                 function(x) 
                    as.data.frame(table(unlist(mapply(`:`, x$Start, x$End-1))))))
data.frame(CountryID = unlist(lapply(strsplit(row.names(res), ".", fixed = T), `[`, 1)), 
           Year = res$Var1,
           Ongoing = res$Freq, stringsAsFactors = F)
#   CountryID Year Ongoing
#1        CAN 2006       1
#2        CAN 2007       1
#3        CAN 2008       1
#4        CAN 2009       2
#5        CAN 2010       2
#6        CAN 2011       1
#7         US 2009       1
#8         US 2010       2
#9         US 2011       1
#10        US 2012       1