我有一个包含2个日期(开始日期和结束日期)和持续时间(结束日期 - 开始日期)的项目列表。我想将它们切成垃圾箱以显示每个垃圾箱中的“活动项目”的数量,即,如果开始日期< = bin日期和结束日期> bin日期,该项应该在bin中计算。
Item StartDate EndDate Duration
Machine1 2005/01/21 2011/03/29 2258
Machine2 2004/05/12 2012/05/08 2918
Machine3 2004/10/15 2005/09/10 330
Machine4 2004/08/30 2011/08/02 2528
Machine5 2005/06/06 2010/12/03 2006
Machine6 2004/05/11 2007/03/17 1040
Machine7 2005/08/09 2011/05/30 2120
Machine8 2005/01/06 2012/06/07 2709
Machine9 2005/06/13 2008/08/28 1172
Machine10 2005/06/28 2010/04/08 1745
Machine11 2004/11/09 2007/05/14 916
Machine12 2005/05/26 2012/09/16 2670
Machine13 2004/05/28 2009/06/09 1838
Machine14 2005/01/06 2012/05/25 2696
Machine15 2005/08/20 2012/02/11 2366
Machine16 2004/08/02 2011/10/23 2638
Machine17 2004/08/10 2009/03/15 1678
Machine18 2005/05/08 2006/04/17 344
Machine19 2005/08/26 2006/07/24 332
Machine20 2004/03/30 2006/05/07 768
我要生成的Bin计数:
2004/01/01 0
2005/01/01 9
2006/01/01 19
2007/01/01 16
2008/01/01 14
2009/01/01 13
2010/01/01 11
2011/01/01 9
2012/01/01 5
2013/01/01 0
正如您所看到的,这些垃圾箱的总数并不等于项目总数,正如您对传统直方图所期望的那样。
我可以用一些冗长的代码来做到这一点,但我确信必须有一些简短的方法,使用cut或split。根据我上面的定义,我知道bin标签已经关闭了,但是现在让我们忽略它。
答案 0 :(得分:2)
方法是:
#turn dates to actual dates
DF$StartDate <- as.Date(DF$StartDate, "%Y/%m/%d")
DF$EndDate <- as.Date(DF$EndDate, "%Y/%m/%d")
binDF[,1] <- as.Date(binDF[,1], "%Y/%m/%d")
counts <- colSums(sapply(binDF[,1], function(x) {DF$StartDate <= x & DF$EndDate > x}))
#> counts
#[1] 0 9 19 16 14 13 11 9 5 0
作为完整的数据框:
resDF <- data.frame(dates = binDF[,1], counts = counts, stringsAsFactors = F)
#> resDF
# dates counts
#1 2004-01-01 0
#2 2005-01-01 9
#3 2006-01-01 19
#4 2007-01-01 16
#5 2008-01-01 14
#6 2009-01-01 13
#7 2010-01-01 11
#8 2011-01-01 9
#9 2012-01-01 5
#10 2013-01-01 0
数据框DF
和binDF
:
DF <- structure(list(Item = c("Machine1", "Machine2", "Machine3", "Machine4",
"Machine5", "Machine6", "Machine7", "Machine8", "Machine9", "Machine10",
"Machine11", "Machine12", "Machine13", "Machine14", "Machine15",
"Machine16", "Machine17", "Machine18", "Machine19", "Machine20"
), StartDate = c("2005/01/21", "2004/05/12", "2004/10/15", "2004/08/30",
"2005/06/06", "2004/05/11", "2005/08/09", "2005/01/06", "2005/06/13",
"2005/06/28", "2004/11/09", "2005/05/26", "2004/05/28", "2005/01/06",
"2005/08/20", "2004/08/02", "2004/08/10", "2005/05/08", "2005/08/26",
"2004/03/30"), EndDate = c("2011/03/29", "2012/05/08", "2005/09/10",
"2011/08/02", "2010/12/03", "2007/03/17", "2011/05/30", "2012/06/07",
"2008/08/28", "2010/04/08", "2007/05/14", "2012/09/16", "2009/06/09",
"2012/05/25", "2012/02/11", "2011/10/23", "2009/03/15", "2006/04/17",
"2006/07/24", "2006/05/07"), Duration = c(2258L, 2918L, 330L,
2528L, 2006L, 1040L, 2120L, 2709L, 1172L, 1745L, 916L, 2670L,
1838L, 2696L, 2366L, 2638L, 1678L, 344L, 332L, 768L)), .Names = c("Item",
"StartDate", "EndDate", "Duration"), class = "data.frame", row.names = c(NA,
-20L))
binDF <- structure(list(V1 = c("2004/01/01", "2005/01/01", "2006/01/01",
"2007/01/01", "2008/01/01", "2009/01/01", "2010/01/01", "2011/01/01",
"2012/01/01", "2013/01/01"), V2 = c(0L, 9L, 19L, 16L, 14L, 13L,
11L, 9L, 5L, 0L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-10L))