我有一个数据集output
,我想将其转换为数据集library('data.table')
input=data.table(SerialNum=c(1,2),StartDate=c('28/01/2015','28/01/2015'),EndDate=c('03/02/2015','03/03/2015'))
# SerialNum StartDate EndDate
# 1: 1 28/01/2015 03/02/2015
# 2: 2 28/01/2015 03/03/2015
output=data.table(SerialNum=c(1,1,2,2,2),
StartDate=c('28/01/2015','28/01/2015','28/01/2015','28/01/2015','28/01/2015'),
EndDate=c('03/02/2015','03/02/2015','03/03/2015','03/03/2015','03/03/2015'),
MMMYY=c('Jan15','Feb15','Jan15','Feb15','Mar15'),
Days=c(4,3,4,28,3))
# SerialNum StartDate EndDate MMMYY Days
# 1: 1 28/01/2015 03/02/2015 Jan15 4
# 2: 1 28/01/2015 03/02/2015 Feb15 3
# 3: 2 28/01/2015 03/03/2015 Jan15 4
# 4: 2 28/01/2015 03/03/2015 Feb15 28
# 5: 2 28/01/2015 03/03/2015 Mar15 3
。基本上我试图操纵数据集,以便我能够将两个日期之间的天数分成不同的月份。想知道这是否可以在R中完成?
用于创建下面数据集的R代码,以便于参考。
$(document).ready(function() {
CKEDITOR.config.toolbar = [
[ 'Cut', 'Copy', 'Paste', 'PasteText', 'PasteFromWord' ],
{
name: 'basicstyles',
groups: [ 'basicstyles', 'cleanup' ],
items: [ 'Bold', 'Italic', '-', 'RemoveFormat' ]
},
{
name: 'paragraph',
groups: [ 'list', 'indent', 'blocks', 'align', 'bidi' ],
items: [ 'NumberedList', 'BulletedList', '-', 'JustifyLeft', 'JustifyCenter', 'JustifyRight', 'JustifyBlock' ]
},
{
name: 'links',
items: [ 'Link', 'Unlink', 'VideoDetector' ]
}
];
CKEDITOR.config.uiColor = '#e5e5e5';
CKEDITOR.config.removePlugins = 'elementspath';
CKEDITOR.config.extraPlugins = 'videodetector';
CKEDITOR.config.extraAllowedContent = 'iframe[*]';
CKEDITOR.config.allowedContent = true;
});
答案 0 :(得分:3)
您可以通过创建从StartDate
到EndDate
的序列并从中提取月份变量(下例中的mnth
)来执行此操作。接下来按serialNum
和新创建的月 - 年变量(mnth
)汇总:
input[, .(mnth = format(seq(StartDate,EndDate,"day"), "%b%y")),
by = .(SerialNum, StartDate, EndDate)
][, .N, by = .(SerialNum, StartDate, EndDate, mnth)]
会给你:
SerialNum StartDate EndDate mnth N
1: 1 2015-01-28 2015-02-03 jan15 4
2: 1 2015-01-28 2015-02-03 feb15 3
3: 2 2015-01-28 2015-03-03 jan15 4
4: 2 2015-01-28 2015-03-03 feb15 28
5: 2 2015-01-28 2015-03-03 mrt15 3
如果您的StartDate
和EndDate
列未格式化为日期,则可以将它们转换为日期格式:
input[, `:=` (StartDate = as.Date(StartDate,"%d/%m/%Y"),
EndDate = as.Date(EndDate,"%d/%m/%Y"))]
# or with the 'lubridate' package like @Titolondon used
library(lubridate)
input[, `:=` (StartDate = dmy(StartDate), EndDate = dmy(EndDate))]
使用过的数据:
input <- data.table(SerialNum = c(1,2),
StartDate = as.Date(c('28/01/2015','28/01/2015'),"%d/%m/%Y"),
EndDate = as.Date(c('03/02/2015','03/03/2015'),"%d/%m/%Y"))
答案 1 :(得分:1)
这是一个基础R解决方案,而不是data.table
解决方案:sapply
分别对SerialNum
的每个值进行操作。我们创建了从StartDate
到EndDate
的日期序列,然后计算每个月内的日期数。整个事情都包含在do.call(rbind, ...)
中,以将结果列表转换为单个数据框。
library(lubridate)
input = data.frame(SerialNum=c(1,2),StartDate=c('28/01/2015','28/01/2015'),EndDate=c('03/02/2015','03/03/2015'),
stringsAsFactors=FALSE)
input[,2:3] = lapply(input[,2:3], dmy)
do.call(rbind,
sapply(unique(input$SerialNum), function(i) {
start = input[input$SerialNum==i,"StartDate"]
end = input[input$SerialNum==i, "EndDate"]
dates = seq(start, end, by="1 day")
data.frame(SerialNum=i, StartDate=start, EndDate=end,
MMMYY=unique(format(dates, "%b%y")),
Days=sapply(split(dates, droplevels(month(dates, label=TRUE))), length))
}, simplify=FALSE))
SerialNum StartDate EndDate MMMYY Days
Jan 1 2015-01-28 2015-02-03 Jan15 4
Feb 1 2015-01-28 2015-02-03 Feb15 3
Jan1 2 2015-01-28 2015-03-03 Jan15 4
Feb1 2 2015-01-28 2015-03-03 Feb15 28
Mar 2 2015-01-28 2015-03-03 Mar15 3
答案 2 :(得分:1)
使用data.table
和lubridate
:
library(data.table)
input = data.table(
SerialNum = c(1, 2),
StartDate = c('28/01/2015', '28/01/2015'),
EndDate = c('03/02/2015', '03/03/2015')
)
使用lubridate
进行日期操作
library(lubridate)
如果尚未加入POSIXct
,请使用lubridate
函数
input[, StartDate := dmy(StartDate)]
input[, EndDate := dmy(EndDate)]
技巧:按StartDate
EndDate
和SerialNum
之间的日期序列
DT <- input[, .(seqDate = StartDate + days(0:(EndDate - StartDate))),
by = .(SerialNum, StartDate, EndDate)]
从此新日期序列中导出MMMYY
列。我使用month.abb
来获得良好的缩写,但如果您处于良好的语言环境设置,则可以使用MMMYY = format(seqDate, "%b%y")
DT[, MMMYY := paste0(month.abb[month(seqDate)], format(seqDate, "%y"))]
按月计算天数(列MMMYY
)
output = DT[, .(Days = .N), by = .(SerialNum, StartDate, EndDate, MMMYY)]
output
#> SerialNum StartDate EndDate MMMYY Days
#> 1: 1 2015-01-28 2015-02-03 Jan15 4
#> 2: 1 2015-01-28 2015-02-03 Feb15 3
#> 3: 2 2015-01-28 2015-03-03 Jan15 4
#> 4: 2 2015-01-28 2015-03-03 Feb15 28
#> 5: 2 2015-01-28 2015-03-03 Mar15 3