R:基于来源日期范围的每月分类

时间:2017-08-21 11:10:45

标签: r date

我发现R的分箱示例似乎假设源数据具有单个日期(或日期/时间)。我有2002年至2017年的用户帐户的离散开始和停止日期。我希望在2002-17整个范围内使用每月垃圾箱输出活跃帐户数量。

数据目前是dd / mm / yyyy字符串,但如果需要我可以轻松更改格式;行按升序开始日期排序。例如

Start       Stop
04/09/2006  23/01/2014
...
06/07/2008  11/03/2017
...
30/09/2010  22/04/2016

结果计数将是,例如:

Mar 2006    0
Jan 2007    1
Mar 2011    3
Jun 2015    2
Sep 2016    1
...etc.

生成计数的目的是绘制总活动帐户随时间变化的图。我愿意接受每日计数,然后按月累计,如果更容易的话。我在开始时卡住了:bin是源是日期范围而不是单个日期。

2 个答案:

答案 0 :(得分:2)

将列转换为"yearmon"类,并使用mapply生成涵盖的年份/月份ym。然后计算每年/每月发生的数量,并将其与2002年1月至2017年12月期间所有年/月的数据框合并为M_na,并将0替换为M

library(zoo)

DF2 <- transform(DF, Start = as.yearmon(Start), Stop = as.yearmon(Stop))

ym <- unlist(mapply(seq, DF2$Start, DF2$Stop, MoreArgs = list(by = 1/12)))
Ag <- aggregate(ym^0, list(ym = as.yearmon(ym)), sum)

M_na <- merge(Ag, data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = TRUE)
M <- transform(M_na, x = replace(x, is.na(x), 0))


plot(x ~ ym, M, type = "h", xlab = "", ylab = "Count", xaxt = "n")
axis(1, 2002:2017)

(图片后继续)

screenshot

<强> magrittr

这也可以表示为像这样的magrittr管道:

library(magrittr)
library(zoo)

M <- DF %>%
   transform(Start = as.yearmon(Start), Stop = as.yearmon(Stop)) %$%
   unlist(mapply(seq, Start, Stop, MoreArgs = list(by = 1/12))) %>%
   { aggregate(.^0, list(ym = as.yearmon(.)), sum) } %>%
   merge(data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = TRUE) %>%
   transform(x = replace(x, is.na(x), 0))

注意:我们假设以下输入为Date类列:

Lines <- "
Start       Stop
04/09/2006  23/01/2014
06/07/2008  11/03/2017
30/09/2010  22/04/2016"
DF <- read.table(text = Lines, header = TRUE)
fmt <- "%d/%m/%Y"
DF <- transform(DF, Start = as.Date(Start, fmt), Stop = as.Date(Stop, fmt))

答案 1 :(得分:0)

如果您将日期格式设置为月份,然后应用具有所有月 - 年值的因子,您应该得到您想要的内容

# creating data for example
dates <- sample(seq(as.Date('01/01/2002', format='%m/%d/%Y'), 
                    as.Date('12/31/2017', format='%m/%d/%Y'), 
                    by="day"), 30)

# use the cut function to round up to the first of each month if you like
months <- format(as.Date(cut(dates, breaks= 'month')), '%b %Y')
# cut function is not necessary if you prefer to skip to the format
months <- format(dates, '%b %Y')

# Created an ordered vector of months and years
ord_months <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
ord_year <- as.character(2002:2017)
# create an ordered vector of month years
months_ordered <- apply(expand.grid(ord_months, ord_year), 1, paste, collapse = ' ')
head(months_ordered)

# factor the format and apply the factored vector as the levels
monthsF <- factor(months, levels=months_ordered)
table(monthsF)