我有一个带有StartDate和EndDate的输入数据框,格式为日期:
input_df:
C1 C2 StartDate EndDate
A B 9/5/2019 12/14/2019
C D 4/12/2019 5/14/2019
E F 12/5/2019 12/15/2019
我正在尝试根据某些条件实现以下输出:
-如果sys.date()小于或等于EndDate,那么我想保留该行并用Year + 1添加另一行
-如果sys.Date()大于EndDate,则从年份到2020年替换2019
所需的输出是:
output_df:
C1 C2 StartDate EndDate
A B 9/5/2019 12/14/2019
A B 9/5/2020 12/14/2020
C D 4/12/2020 5/14/2020
E F 12/5/2019 12/15/2019
E F 12/5/2020 12/15/2020
我已经探究了split_rows和lubridate,但是不确定如何将if条件与那些函数结合在一起。数据框很大,我正在尝试避免for循环这样做?
答案 0 :(得分:1)
一种选择是使用case_when
在“ StartDate”,“ EndDate”列上增加一年,然后与原始数据集绑定
library(dplyr)
library(lubridate)
input_df %>%
mutate_at(3:4, ~ mdy(.) %m+% years(1)) %>%
bind_rows(input_df %>%
mutate_at(3:4, mdy)) %>%
arrange_all() %>%
group_by(C1, C2) %>%
slice(if(first(EndDate) <= Sys.Date()) n() else row_number())
# A tibble: 5 x 4
# Groups: C1, C2 [3]
# C1 C2 StartDate EndDate
# <chr> <chr> <date> <date>
#1 A B 2019-09-05 2019-12-14
#2 A B 2020-09-05 2020-12-14
#3 C D 2020-04-12 2020-05-14
#4 E F 2019-12-05 2019-12-15
#5 E F 2020-12-05 2020-12-15
或者另一种选择是根据条件uncount
扩展行,然后通过增加一年replace
最后一行
library(tidyr)
input_df %>%
mutate_at(3:4, mdy) %>%
mutate(n = 1 + (Sys.Date() <= EndDate)) %>%
uncount(n) %>%
group_by(C1, C2) %>%
mutate_at(vars(-group_cols()), ~ replace(., n(), .[n()] + years(1)))
# A tibble: 5 x 4
# Groups: C1, C2 [3]
# C1 C2 StartDate EndDate
# <chr> <chr> <date> <date>
#1 A B 2019-09-05 2019-12-14
#2 A B 2020-09-05 2020-12-14
#3 C D 2020-04-12 2020-05-14
#4 E F 2019-12-05 2019-12-15
#5 E F 2020-12-05 2020-12-15
或使用base R
nm1 <- c('StartDate', 'EndDate')
input_df[nm1] <- lapply(input_df[nm1], as.Date, format = "%m/%d/%Y")
i1 <- Sys.Date() <= input_df$EndDate
lst1 <- lapply(input_df[i1, nm1], function(date)
do.call(c, lapply(date, seq, length.out = 2, by = '1 year')))
input_df2 <- input_df[rep(seq_len(nrow(input_df)), i1 + 1),]
input_df2[rep(i1, i1 +1), nm1] <- lst1
input_df <- structure(list(C1 = c("A", "C", "E"), C2 = c("B", "D", "F"),
StartDate = c("9/5/2019", "4/12/2019", "12/5/2019"), EndDate = c("12/14/2019",
"5/14/2019", "12/15/2019")), class = "data.frame", row.names = c(NA,
-3L))