我的数据框如下:
DATE <- as.Date(c('2016-12-01', '2016-12-02', '2016-12-03', '2016-12-04', '2016-12-01', '2016-12-03', '2016-12-04', '2016-12-04' ))
Parent <- c('A','A','A','A','A','A','A','B')
Child <- c('ab', 'ab', 'ab', 'ab', 'ac','ac', 'ac','bd')
salary <- c(1000, 100, 4000, 2000,1000,3455,1234,600)
avg_child_salary <- c(500, 500, 500, 500, 300, 300, 300, 9000)
Callout <- c('HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)
employ.data
DATE Parent Child avg_child_salary salary Callout
1 2016-12-01 A ab 500 1000 HIGH
2 2016-12-02 A ab 500 100 LOW
3 2016-12-03 A ab 500 4000 HIGH
4 2016-12-04 A ab 500 2000 HIGH
5 2016-12-01 A ac 300 1000 HIGH
6 2016-12-03 A ac 300 3455 HIGH
7 2016-12-04 A ac 300 1234 HIGH
8 2016-12-04 B bd 9000 600 LOW
我已将昨天的数据2016-12-04
过滤掉,如下所示:
yesterday <- as.Date(Sys.Date()-1)
df2<-filter(employ.data, DATE == yesterday)
df2
DATE Parent Child avg_child_salary salary Callout
4 2016-12-04 A ab 500 2000 HIGH
7 2016-12-04 A ac 300 1234 HIGH
8 2016-12-04 B bd 9000 600 LOW
我的目标是在Callout
旁边添加一列,显示2016-12-04
标注为HIGH
或LOW
Child
的连续天数基于employ.data
数据框。这就是我需要的最终输出:
DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
4 2016-12-04 A ab 500 2000 HIGH 2
7 2016-12-04 A ac 300 1234 HIGH 2
8 2016-12-04 B bd 9000 600 LOW 1
谢谢!
答案 0 :(得分:2)
试试这个我的男人
library(lubridate)
df3 <- df2 %>%
group_by(child, callout) %>%
mutate(DATE = ymd(DATE),
consecutive_day_flag = if_else(DATE == (lag(DATE) + days(1)), 1, 0),
how_many = sum(consecutive_day_flag))
答案 1 :(得分:1)
这是另一种非常混乱的方法,但我认为你想要的是:
library(dplyr)
yesterday <- as.Date(Sys.Date()-1)
df2 <- employ.data %>% group_by(Child) %>%
mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>%
filter(DATE == yesterday)
##Source: local data frame [3 x 7]
##Groups: Child [3]
##
## DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
## <date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
##1 2016-12-04 A ab 500 2000 HIGH 2
##2 2016-12-04 A ac 300 1234 HIGH 2
##3 2016-12-04 B bd 9000 600 LOW 1
注意:
(yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]
与TRUE
的{{1}}相同,则 Callout
计算该行Callout
的条件行yesterday
的行距离与日期的天数相同。这会给出yesterday
列,如下所示:
Cond
鉴于此,我们希望从Source: local data frame [8 x 7]
Groups: Child [3]
DATE Parent Child avg_child_salary salary Callout Cond
<date> <fctr> <fctr> <dbl> <dbl> <fctr> <lgl>
1 2016-12-01 A ab 500 1000 HIGH TRUE
2 2016-12-02 A ab 500 100 LOW FALSE
3 2016-12-03 A ab 500 4000 HIGH TRUE
4 2016-12-04 A ab 500 2000 HIGH TRUE
5 2016-12-01 A ac 300 1000 HIGH FALSE
6 2016-12-03 A ac 300 3455 HIGH TRUE
7 2016-12-04 A ac 300 1234 HIGH TRUE
8 2016-12-04 B bd 9000 600 LOW TRUE
行(按TRUE
分组)向后计算连续yesterday
的数量。要执行此操作,我们可以使用Child
撤消向量,执行rev
,一旦遇到cumprod
,就会从1
切换到0
,使用FALSE
再次反向向量,最后执行rev
累积连续的天数。执行此操作会将cumsum
列解释为与Consec. Days with Callout
具有相同Callout
的前一连续天数:
yesterday
最后,按照您的Source: local data frame [8 x 7]
Groups: Child [3]
DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
<date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
1 2016-12-01 A ab 500 1000 HIGH 0
2 2016-12-02 A ab 500 100 LOW 0
3 2016-12-03 A ab 500 4000 HIGH 1
4 2016-12-04 A ab 500 2000 HIGH 2
5 2016-12-01 A ac 300 1000 HIGH 0
6 2016-12-03 A ac 300 3455 HIGH 1
7 2016-12-04 A ac 300 1234 HIGH 2
8 2016-12-04 B bd 9000 600 LOW 1
生成最终结果。