我有一个国家/地区基于事件组的数据集。它看起来像这样:
Data <- data.frame(EpStart=c("2010-01-01 00:00:00", "2009-01-01 00:00:00",
"2009-01-01 00:00:00", "2006-01-01 00:00:00"),
EpEnd=c("2011-01-01 00:03:00", "2013-01-01 00:00:00",
"2012-01-01 00:00:00", "2011-01-01 00:00:00"),
countryID=c("US","US", "CAN","CAN"))
我想创建一个数据框,将数据拆分为基于countryID分组的年度日历间隔。我需要将其转换为如下所示的数据框:
CountryID Year Ongoing
1 US 2009 1
2 US 2010 2
3 US 2011 1
4 US 2012 1
5 CAN 2006 1
6 CAN 2007 1
7 CAN 2008 1
8 CAN 2009 2
9 CAN 2010 2
10 CAN 2011 1
我已尝试使用@提供的here,示例,但是我找不到任何有关如何在拆分数据时保留CountryID的解决方案。
tmp <- do.call(c, apply(Data, 1,
function(x) head(seq(from = as.POSIXct(x[1]),
to = as.POSIXct(x[2]),by = "years"),
-1)))
tmp <- sapply(split(tmp, format(tmp, format = "%Y")), length)
Ongoing <- data.frame(Date=names(tmp), Ongoing = tmp, row.names=NULL)
这会返回,但不会按CountryID分割数据:
> Ongoing
Date Ongoing
1 2006 1
2 2007 1
3 2008 1
4 2009 3
5 2010 4
6 2011 2
7 2012 1
答案 0 :(得分:0)
我想这样的事情看起来有效:
Data$Start = as.numeric(format(as.Date(Data$EpStart, "%Y-%m-%d"), "%Y"))
Data$End = as.numeric(format(as.Date(Data$EpEnd, "%Y-%m-%d"), "%Y"))
res = do.call(rbind,
lapply(split(Data, Data$countryID),
function(x)
as.data.frame(table(unlist(mapply(`:`, x$Start, x$End-1))))))
data.frame(CountryID = unlist(lapply(strsplit(row.names(res), ".", fixed = T), `[`, 1)),
Year = res$Var1,
Ongoing = res$Freq, stringsAsFactors = F)
# CountryID Year Ongoing
#1 CAN 2006 1
#2 CAN 2007 1
#3 CAN 2008 1
#4 CAN 2009 2
#5 CAN 2010 2
#6 CAN 2011 1
#7 US 2009 1
#8 US 2010 2
#9 US 2011 1
#10 US 2012 1