R - 根据分割的日期时间间隔重复数据框中的行

时间:2017-03-27 21:34:27

标签: r datetime dataframe split

我有一个这样的数据框:

workplace = c('wp1','wp1','wp1')
state = c("working", "stopped", "working")
startdate = as.POSIXct(c('2010-11-1 4:53.12','2010-11-1 5:25.43','2010-11-1 5:31.43'))
enddate = as.POSIXct(c('2010-11-1 5:25.43','2010-11-1 5:31.43','2010-11-1 5:32.02'))
timeline = data.frame(workplace, state, startdate, enddate)

# Printed looks like this
#  workplace   state           startdate             enddate
#1       wp1 working 2010-11-01 04:53:00 2010-11-01 05:25:00
#2       wp1 stopped 2010-11-01 05:25:00 2010-11-01 05:31:00
#3       wp1 working 2010-11-01 05:31:00 2010-11-01 05:32:00

我的目的是创建一个间隔为15分钟的时间序列,并将每一行分配到其相应的间隔,重复该行并分割startdate-enddate间隔(如果它超过指定的15分钟间隔)。给出的示例最终应该如下所示:

#  workplace   state           startdate             enddate            interval
#1       wp1 working 2010-11-01 04:53:00 2010-11-01 05:00:00 2010-11-01 04:45:00
#2       wp1 working 2010-11-01 05:00:00 2010-11-01 05:15:00 2010-11-01 05:00:00
#3       wp1 working 2010-11-01 05:15:00 2010-11-01 05:25:00 2010-11-01 05:15:00
#4       wp1 stopped 2010-11-01 05:25:00 2010-11-01 05:30:00 2010-11-01 05:15:00
#5       wp1 stopped 2010-11-01 05:30:00 2010-11-01 05:31:00 2010-11-01 05:30:00
#6       wp1 working 2010-11-01 05:31:00 2010-11-01 05:32:00 2010-11-01 05:30:00

我设法用这段代码添加了interval列:

library(lubridate)
library(xts)
timeline = cbind(timeline,interval=align.time(timeline$startdate - lubridate::minutes(15), n=60*15))

现在我需要从startdate日期开始,以15分钟为间隔,将每行enddateinterval之间的日期时间间隔分开,并重复行" n& #34; 15分钟的间隔时间。我尝试使用seq.POSIXt(timeline$interval, timeline$enddate, by=60*15)创建序列但不能与列一起使用。我怎么能对每一行进行拆分并重复seq.POSIXt创建的间隔数?它可能比我正在寻找的任何其他方法更容易吗?

非常感谢您的帮助

1 个答案:

答案 0 :(得分:0)

最后,我设法建立了以四分之一小时的绝对间隔分割日期时间间隔的数据框。我没有使用lubridate或xts包,而是使用自定义函数和data.table包的帮助。这是完整的脚本,以防将来有相同需求的人。它可以用于任何可以在60分钟内完成的间隔持续时间。

workplace = c('wp1','wp1','wp1')
state = c("working", "stopped", "working")
startdate = as.POSIXct(c('2010-11-1 4:53.12','2010-11-1 5:25.43','2010-11-1 5:31.43'))
enddate = as.POSIXct(c('2010-11-1 5:25.43','2010-11-1 5:31.43','2010-11-1 5:32.02'))
timeline = data.frame(workplace, state, startdate, enddate)

library(data.table)
timeline = as.data.table(timeline)

intervalMinutes = 15L
intervalSeconds = 60L * intervalMinutes

# Function to get the absolute quarter of hour
alignInterval = function(x) {
  hourInterval = trunc(x, "hour")
  hourInterval + ((as.integer(x - hourInterval) %/% intervalSeconds) * intervalSeconds)
}

# For the original start date and end date, create the columns with the corresponding absolute quarter of hour
timeline[,`:=`(startDateInterval = alignInterval(startDate),
           endDateInterval = alignInterval(endDate))]
# Now create a new column with the number of absolute intervals between the start and end
timeline[,numIntervals := as.integer(endDateInterval - startDateInterval) %/% intervalSeconds + 1L]

# Repeat each row depending on the value of the new column with the number of intervals
timeline = timeline[rep( seq(1, .N), numIntervals),]

# Set the absolute quarter of hour corresponding to each repeated row multiplying 15 minutes by the number in sequence of the group of workplace, state and startDate
timeline[,interval := startDateInterval + (intervalSeconds*(seq(.N)-1)),.(workplace,state,startDate)]

# Adding 15 minutes to the interval start date we will have the end date of each interval
timeline[,intervalEnd:= interval + intervalSeconds]

# Fix the start date in the intermediate intervals
timeline[startDate < interval, startDate := interval]

# Fix the end date in the intermediate intervals
timeline[endDate > intervalEnd, endDate := intervalEnd]

# Clear the columns created for calculation purposes
timeline[,`:=`(startDateInterval = NULL,
          endDateInterval = NULL,
          intervalEnd = NULL,
          numIntervals = NULL)]