我已经看过stackoverflow,看到了我需要的各种变体,但是没有一个可以为我工作。
我有一大组数据,包括116列和326438行。我需要使用现有的日期字段作为计算将每一行拆分为两行,并添加新的日期列“ StartDate”和“ EndDate”。
如果第1行显示的PolicyEffectiveDate为01/06/2018,而PolicyRenewalDate为01/06/2019,则我需要该数据来反映两行内容,如下所示:
第一行将显示StartDate为01/06/2018和EndDate为31/12/2018,下一行将显示StartDate为01/01/2019和EndDate为31/05/2019。 StartDate和EndDate是在此过程中创建的新列。新行上的所有其他数据应与第一个条目匹配,实际上,我们正在从1中创建两行,除要创建的两个新字段外,所有数据都匹配。
我目前所拥有的是:
PolicyEffectiveDate PolicyRenewalDate Customer
2017-06-01 2018-06-01 Arc Ltd
2017-04-03 2018-04-03 Windonian CC
我需要的是这个
PolicyStartDate PolicyEndDate Customer
2017-06-01 2017-12-31 Arc Ltd
2018-01-01 2018-05-31 Arc Ltd
2017-04-03 2017-12-31 Windonian CC
2018-01-01 2018-04-02 Windonian CC
创建这两个示例df的代码是:
mydf <- data.frame(PolicyEffectiveDate = as.Date(c("2017-06-01", "2017-04-03")),
PolicyRenewalDate = as.Date(c("2018-06-01", "2018-04-03")),
Customer = as.character(c("Arc Ltd", "Windonian CC")),
stringsAsFactors = FALSE)
newdf <- data.frame(PolicyStartDate = as.Date(c("2018-06-01", "2019-01-01","2017-04-03", "2018-01-01")),
PolicyEndDate = as.Date(c("2018-12-31", "2019-05-31","2017-12-31", "2018-04-02")),
Customer = as.character(c("Arc Ltd","Arc Ltd", "Windonian CC","Windonian CC")),
stringsAsFactors = FALSE)
答案 0 :(得分:0)
您可以使用ceiling_date
中的lubridate
:
library(lubridate)
library(dplyr)
df %>%
mutate(PolicyRenewalDate = ceiling_date(PolicyEffectiveDate, "y") - 1) %>%
bind_rows(mutate(df,
PolicyEffectiveDate = .$PolicyRenewalDate + 1,
PolicyRenewalDate = PolicyRenewalDate - 1
)) %>%
arrange(Customer) %>%
rename(PolicyStartDate = PolicyEffectiveDate,
PolicyEndDate = PolicyRenewalDate)
#### OUTPUT ####
PolicyStartDate PolicyEndDate Customer
1 2017-06-01 2017-12-31 Arc Ltd
2 2018-01-01 2018-05-31 Arc Ltd
3 2017-04-03 2017-12-31 Windonian CC
4 2018-01-01 2018-04-02 Windonian CC