将日期序列添加到data.table(R)

时间:2018-07-20 20:34:05

标签: r datatable

我有一个数据表,其中包含以不同频率重复发生事件的地点的位置。提供了上次事件的日期以及发生的频率。

示例:

dt
#    Location Last_Occurrence Frequency
# 1: Home     7-19-2018       30
# 2: School   6-6-2018        60
# 3: Moon     1-5-1993        90

我想做的是添加一个新列,其中包括到2018年底每个位置的所有未来活动日期。

所以,我想要一个看起来如下的表:

dt
#    Location Last_Occurrence Frequency Next_Dates
# 1: Home     7-19-2018       30        7-19-2018
# 2: Home     7-19-2018       30        8-18-2018
# 3: Home     7-19-2018       30        9-17-2018
# 4: Home     7-19-2018       30        10-17-2018
# 5: Home     7-19-2018       30        11-16-2018
# 6: Home     7-19-2018       30        12-16-2018
# 7: School   6-6-2018        60        6-6-2018
# 8: School   6-6-2018        60        8-5-2018
# 9: School   6-6-2018        60        10-4-2018
etc.

我应该如何去做?我怀疑lapply函数会有用,因为我正在每个位置执行此操作...

我已经弄清楚了如何使用“ while”循环来生成将来日期的向量:

Last_Sample_Date <- Sys.Date() #For testing
increase <- 5 #For testing
NextDate <- Last_Sample_Date+increase
multiplier <- 1  

# Create vector of next sampling dates - updated with each iteration of the while loop
NextDates <- c(Last_Sample_Date, NextDate)

while (year(NextDate) == 2018) {
  multiplier <- multiplier+1
  NextDate <- NextDate+multiplier*increase

  #Add to vector of next sampling dates
  NextDates <- append(NextDates, NextDate)
})

(我意识到这实际上会生成一个包含2019年最后日期的向量,但是我可以接受)。

我可以以某种方式使用while循环,还是还有其他方法可以解决这个问题?

2 个答案:

答案 0 :(得分:1)

我的带有data.table的版本

library(data.table)

# create example dataset
dt <- data.table(
        location = c("home", "school", "moon"),
        orig_date = as.Date(c("2018-07-19", "2018-06-06", "2015-01-05")),
        freq_days = c(30, 60, 90)
)

# figure out how many new rows are needed
dt[ , rows_needed := length(seq(from=orig_date, to=as.Date("2018-12-31"), by=paste(freq_days,"days"))), by=location]

# expand the data.table to include the new rows
dt <- dt[rep(1:nrow(dt), times=rows_needed)]

# add the dates of occurrence
dt[ , date_of_occurrence := seq(from=orig_date[1], to=as.Date("2018-12-31"), by=paste(freq_days[1],"days")), by=location]

# shift dates of occurrence to get next date
dt[ , next_date := shift(date_of_occurrence, type="lead"), by=location]

# drop rows where next occurrence is after 2018 (should you want this)
dt <- dt[!is.na(next_date)]

答案 1 :(得分:0)

IIUC,其中complete中有tidyr

df %>% group_by(Location,Frequency,Last_Occurrence) %>%
      mutate(next_date=Last_Occurrence)%>%
      complete(next_date=seq(from = next_date, to = as.Date("2018-12-31"),by = Frequency))

# A tibble: 10 x 4
# Groups:   Location, Frequency, Last_Occurrence [2]
   Location Frequency Last_Occurrence  next_date
      <chr>     <int>          <date>     <date>
 1     Home        30      2018-07-19 2018-07-19
 2     Home        30      2018-07-19 2018-08-18
 3     Home        30      2018-07-19 2018-09-17
 4     Home        30      2018-07-19 2018-10-17
 5     Home        30      2018-07-19 2018-11-16
 6     Home        30      2018-07-19 2018-12-16
 7   School        60      2018-06-06 2018-06-06
 8   School        60      2018-06-06 2018-08-05
 9   School        60      2018-06-06 2018-10-04
10   School        60      2018-06-06 2018-12-03