我的date
范围如下。其中两个日期为payDay
。对于发薪日前后3天的每个日期,我想返回payDay
前后的天数。
下面,whatIHave
说明了我的数据,whatIWant
显示了结果。我想在dplyr
中进行此操作。任何帮助将不胜感激。谢谢。
whatIHave <- data.frame(
date = seq(as.Date("2019/11/01"), as.Date("2019/12/01"), "days"),
payDay = c(0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0))
whatIWant <- data.frame(
date = seq(as.Date("2019/11/01"), as.Date("2019/12/01"), "days"),
payDay = c(0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0),
payDayLag = c(0,0,0,0,0,0,0,0,0,-3,-2,-1,0,1,2,3,0,0,0,0,0,-3,-2,-1,0,1,2,3,0,0,0))
答案 0 :(得分:1)
一种选择是识别“ payDay”为1的行
library(data.table)
library(dplyr)
ind <- which(whatIHave$payDay == 1)
基于'ind'创建行索引序列
v1 <- unlist(lapply(ind, function(i) (i-3):(i+3)))
然后,使用基于通过检查'v1'%in%
行序列(row_number()
)创建的逻辑向量的分组变量,通过减去row_number()
来创建'payDayLag'从“ payDay”为1的行索引开始
whatIHave %>%
group_by(group = rleid(row_number() %in% v1)) %>%
mutate(payDayLag = if(all(payDay == 0)) 0
else row_number() - row_number()[payDay==1]) %>%
ungroup %>%
select(-group)
# A tibble: 31 x 3
# date payDay payDayLag
# <date> <dbl> <dbl>
# 1 2019-11-01 0 0
# 2 2019-11-02 0 0
# 3 2019-11-03 0 0
# 4 2019-11-04 0 0
# 5 2019-11-05 0 0
# 6 2019-11-06 0 0
# 7 2019-11-07 0 0
# 8 2019-11-08 0 0
# 9 2019-11-09 0 0
#10 2019-11-10 0 -3
# … with 21 more rows
如果我们希望在单个链条上
library(tidyverse)
whatIHave %>%
mutate(ind = row_number() * payDay) %>%
filter(payDay == 1) %>%
mutate(ind = map(ind, ~ (.x-3):(.x+3))) %>%
group_by(grp = row_number()) %>%
unnest %>%
mutate(payDayLag = row_number() - row_number()[4]) %>%
ungroup %>%
select(-payDay, -grp, -date) %>%
right_join(whatIHave %>%
mutate(ind = row_number())) %>%
mutate(payDayLag = replace_na(payDayLag, 0))
或者没有加入
whatIHave %>%
mutate(ind = list(map(which(payDay == 1), ~ (.x -3):(.x + 3)))) %>%
group_by(grp = rleid(row_number() %in% unlist(ind[[1]]) )) %>%
select(-ind) %>%
mutate(payDayLag = if(all(payDay == 0)) 0
else row_number() - row_number()[payDay == 1]) %>%
ungroup %>%
select(-grp)