使用2个日期获取不同月份的天数

时间:2016-03-02 16:01:08

标签: r data.table

我有一个数据集output,我想将其转换为数据集library('data.table') input=data.table(SerialNum=c(1,2),StartDate=c('28/01/2015','28/01/2015'),EndDate=c('03/02/2015','03/03/2015')) # SerialNum StartDate EndDate # 1: 1 28/01/2015 03/02/2015 # 2: 2 28/01/2015 03/03/2015 output=data.table(SerialNum=c(1,1,2,2,2), StartDate=c('28/01/2015','28/01/2015','28/01/2015','28/01/2015','28/01/2015'), EndDate=c('03/02/2015','03/02/2015','03/03/2015','03/03/2015','03/03/2015'), MMMYY=c('Jan15','Feb15','Jan15','Feb15','Mar15'), Days=c(4,3,4,28,3)) # SerialNum StartDate EndDate MMMYY Days # 1: 1 28/01/2015 03/02/2015 Jan15 4 # 2: 1 28/01/2015 03/02/2015 Feb15 3 # 3: 2 28/01/2015 03/03/2015 Jan15 4 # 4: 2 28/01/2015 03/03/2015 Feb15 28 # 5: 2 28/01/2015 03/03/2015 Mar15 3 。基本上我试图操纵数据集,以便我能够将两个日期之间的天数分成不同的月份。想知道这是否可以在R中完成?

用于创建下面数据集的R代码,以便于参考。

$(document).ready(function() {

    CKEDITOR.config.toolbar = [
        [ 'Cut', 'Copy', 'Paste', 'PasteText', 'PasteFromWord' ],
        {
            name: 'basicstyles',
            groups: [ 'basicstyles', 'cleanup' ],
            items: [ 'Bold', 'Italic', '-', 'RemoveFormat' ]
        },
        {
            name: 'paragraph',
            groups: [ 'list', 'indent', 'blocks', 'align', 'bidi' ],
            items: [ 'NumberedList', 'BulletedList', '-', 'JustifyLeft', 'JustifyCenter', 'JustifyRight', 'JustifyBlock' ]
        },
        {
            name: 'links',
            items: [ 'Link', 'Unlink', 'VideoDetector' ]
        }
    ];

    CKEDITOR.config.uiColor = '#e5e5e5';
    CKEDITOR.config.removePlugins = 'elementspath';
    CKEDITOR.config.extraPlugins = 'videodetector';
    CKEDITOR.config.extraAllowedContent = 'iframe[*]';
    CKEDITOR.config.allowedContent = true;
});

3 个答案:

答案 0 :(得分:3)

您可以通过创建从StartDateEndDate的序列并从中提取月份变量(下例中的mnth)来执行此操作。接下来按serialNum和新创建的月 - 年变量(mnth)汇总:

input[, .(mnth = format(seq(StartDate,EndDate,"day"), "%b%y")),
      by = .(SerialNum, StartDate, EndDate)
      ][, .N, by = .(SerialNum, StartDate, EndDate, mnth)]

会给你:

   SerialNum  StartDate    EndDate  mnth  N
1:         1 2015-01-28 2015-02-03 jan15  4
2:         1 2015-01-28 2015-02-03 feb15  3
3:         2 2015-01-28 2015-03-03 jan15  4
4:         2 2015-01-28 2015-03-03 feb15 28
5:         2 2015-01-28 2015-03-03 mrt15  3

如果您的StartDateEndDate列未格式化为日期,则可以将它们转换为日期格式:

input[, `:=` (StartDate = as.Date(StartDate,"%d/%m/%Y"),
              EndDate = as.Date(EndDate,"%d/%m/%Y"))]

# or with the 'lubridate' package like @Titolondon used
library(lubridate)
input[, `:=` (StartDate = dmy(StartDate), EndDate = dmy(EndDate))]

使用过的数据:

input <- data.table(SerialNum = c(1,2),
                    StartDate = as.Date(c('28/01/2015','28/01/2015'),"%d/%m/%Y"),
                    EndDate = as.Date(c('03/02/2015','03/03/2015'),"%d/%m/%Y"))

答案 1 :(得分:1)

这是一个基础R解决方案,而不是data.table解决方案:sapply分别对SerialNum的每个值进行操作。我们创建了从StartDateEndDate的日期序列,然后计算每个月内的日期数。整个事情都包含在do.call(rbind, ...)中,以将结果列表转换为单个数据框。

library(lubridate)

input = data.frame(SerialNum=c(1,2),StartDate=c('28/01/2015','28/01/2015'),EndDate=c('03/02/2015','03/03/2015'), 
                   stringsAsFactors=FALSE)

input[,2:3] = lapply(input[,2:3], dmy)

do.call(rbind,
        sapply(unique(input$SerialNum), function(i) {

          start = input[input$SerialNum==i,"StartDate"]
          end = input[input$SerialNum==i, "EndDate"]

          dates = seq(start, end, by="1 day")

          data.frame(SerialNum=i, StartDate=start, EndDate=end, 
                     MMMYY=unique(format(dates, "%b%y")),
                     Days=sapply(split(dates, droplevels(month(dates, label=TRUE))), length))

        }, simplify=FALSE))

     SerialNum  StartDate    EndDate MMMYY Days
Jan          1 2015-01-28 2015-02-03 Jan15    4
Feb          1 2015-01-28 2015-02-03 Feb15    3
Jan1         2 2015-01-28 2015-03-03 Jan15    4
Feb1         2 2015-01-28 2015-03-03 Feb15   28
Mar          2 2015-01-28 2015-03-03 Mar15    3

答案 2 :(得分:1)

使用data.tablelubridate

library(data.table)

input = data.table(
  SerialNum = c(1, 2),
  StartDate = c('28/01/2015', '28/01/2015'),
  EndDate = c('03/02/2015', '03/03/2015')
)

使用lubridate进行日期操作

library(lubridate)

如果尚未加入POSIXct,请使用lubridate函数

转换您的列
input[, StartDate := dmy(StartDate)]
input[, EndDate := dmy(EndDate)]

技巧:按StartDate

创建EndDateSerialNum之间的日期序列
DT <- input[, .(seqDate = StartDate + days(0:(EndDate - StartDate))), 
            by = .(SerialNum, StartDate, EndDate)]

从此新日期序列中导出MMMYY列。我使用month.abb来获得良好的缩写,但如果您处于良好的语言环境设置,则可以使用MMMYY = format(seqDate, "%b%y")

DT[, MMMYY := paste0(month.abb[month(seqDate)], format(seqDate, "%y"))]

按月计算天数(列MMMYY

output = DT[, .(Days = .N), by = .(SerialNum, StartDate, EndDate, MMMYY)]
output
#>    SerialNum  StartDate    EndDate MMMYY Days
#> 1:         1 2015-01-28 2015-02-03 Jan15    4
#> 2:         1 2015-01-28 2015-02-03 Feb15    3
#> 3:         2 2015-01-28 2015-03-03 Jan15    4
#> 4:         2 2015-01-28 2015-03-03 Feb15   28
#> 5:         2 2015-01-28 2015-03-03 Mar15    3