箱柜中的活动项目数

时间:2013-11-13 08:45:22

标签: r distribution frequency cut

我有一个包含2个日期(开始日期和结束日期)和持续时间(结束日期 - 开始日期)的项目列表。我想将它们切成垃圾箱以显示每个垃圾箱中的“活动项目”的数量,即,如果开始日期< = bin日期和结束日期> bin日期,该项应该在bin中计算。

Item        StartDate     EndDate       Duration
Machine1    2005/01/21    2011/03/29    2258
Machine2    2004/05/12    2012/05/08    2918
Machine3    2004/10/15    2005/09/10    330
Machine4    2004/08/30    2011/08/02    2528
Machine5    2005/06/06    2010/12/03    2006
Machine6    2004/05/11    2007/03/17    1040
Machine7    2005/08/09    2011/05/30    2120
Machine8    2005/01/06    2012/06/07    2709
Machine9    2005/06/13    2008/08/28    1172
Machine10   2005/06/28    2010/04/08    1745
Machine11   2004/11/09    2007/05/14    916
Machine12   2005/05/26    2012/09/16    2670
Machine13   2004/05/28    2009/06/09    1838
Machine14   2005/01/06    2012/05/25    2696
Machine15   2005/08/20    2012/02/11    2366
Machine16   2004/08/02    2011/10/23    2638
Machine17   2004/08/10    2009/03/15    1678
Machine18   2005/05/08    2006/04/17    344
Machine19   2005/08/26    2006/07/24    332
Machine20   2004/03/30    2006/05/07    768

我要生成的Bin计数:

2004/01/01  0
2005/01/01  9
2006/01/01  19
2007/01/01  16
2008/01/01  14
2009/01/01  13
2010/01/01  11
2011/01/01  9
2012/01/01  5
2013/01/01  0

正如您所看到的,这些垃圾箱的总数并不等于项目总数,正如您对传统直方图所期望的那样。

我可以用一些冗长的代码来做到这一点,但我确信必须有一些简短的方法,使用cut或split。根据我上面的定义,我知道bin标签已经关闭了,但是现在让我们忽略它。

1 个答案:

答案 0 :(得分:2)

方法是:

#turn dates to actual dates
DF$StartDate <- as.Date(DF$StartDate, "%Y/%m/%d")
DF$EndDate <- as.Date(DF$EndDate, "%Y/%m/%d")
binDF[,1] <- as.Date(binDF[,1], "%Y/%m/%d")

counts <- colSums(sapply(binDF[,1], function(x) {DF$StartDate <= x & DF$EndDate > x}))
#> counts
#[1]  0  9 19 16 14 13 11  9  5  0

作为完整的数据框:

resDF <- data.frame(dates = binDF[,1], counts = counts, stringsAsFactors = F)
#> resDF
#        dates counts
#1  2004-01-01      0
#2  2005-01-01      9
#3  2006-01-01     19
#4  2007-01-01     16
#5  2008-01-01     14
#6  2009-01-01     13
#7  2010-01-01     11
#8  2011-01-01      9
#9  2012-01-01      5
#10 2013-01-01      0

数据框DFbinDF

DF <- structure(list(Item = c("Machine1", "Machine2", "Machine3", "Machine4", 
"Machine5", "Machine6", "Machine7", "Machine8", "Machine9", "Machine10", 
"Machine11", "Machine12", "Machine13", "Machine14", "Machine15", 
"Machine16", "Machine17", "Machine18", "Machine19", "Machine20"
), StartDate = c("2005/01/21", "2004/05/12", "2004/10/15", "2004/08/30", 
"2005/06/06", "2004/05/11", "2005/08/09", "2005/01/06", "2005/06/13", 
"2005/06/28", "2004/11/09", "2005/05/26", "2004/05/28", "2005/01/06", 
"2005/08/20", "2004/08/02", "2004/08/10", "2005/05/08", "2005/08/26", 
"2004/03/30"), EndDate = c("2011/03/29", "2012/05/08", "2005/09/10", 
"2011/08/02", "2010/12/03", "2007/03/17", "2011/05/30", "2012/06/07", 
"2008/08/28", "2010/04/08", "2007/05/14", "2012/09/16", "2009/06/09", 
"2012/05/25", "2012/02/11", "2011/10/23", "2009/03/15", "2006/04/17", 
"2006/07/24", "2006/05/07"), Duration = c(2258L, 2918L, 330L, 
2528L, 2006L, 1040L, 2120L, 2709L, 1172L, 1745L, 916L, 2670L, 
1838L, 2696L, 2366L, 2638L, 1678L, 344L, 332L, 768L)), .Names = c("Item", 
"StartDate", "EndDate", "Duration"), class = "data.frame", row.names = c(NA, 
-20L))

binDF <- structure(list(V1 = c("2004/01/01", "2005/01/01", "2006/01/01", 
"2007/01/01", "2008/01/01", "2009/01/01", "2010/01/01", "2011/01/01", 
"2012/01/01", "2013/01/01"), V2 = c(0L, 9L, 19L, 16L, 14L, 13L, 
11L, 9L, 5L, 0L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-10L))