在R

时间:2016-12-05 21:26:58

标签: r join dplyr

我的数据框如下:

DATE <- as.Date(c('2016-12-01', '2016-12-02', '2016-12-03', '2016-12-04', '2016-12-01', '2016-12-03', '2016-12-04', '2016-12-04' ))
Parent <- c('A','A','A','A','A','A','A','B')
Child <- c('ab', 'ab', 'ab', 'ab', 'ac','ac', 'ac','bd')
salary <- c(1000, 100, 4000, 2000,1000,3455,1234,600)
avg_child_salary <- c(500, 500, 500, 500, 300, 300, 300, 9000)
Callout <- c('HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)

employ.data

        DATE Parent Child avg_child_salary salary Callout
1 2016-12-01      A    ab              500   1000    HIGH
2 2016-12-02      A    ab              500    100     LOW
3 2016-12-03      A    ab              500   4000    HIGH
4 2016-12-04      A    ab              500   2000    HIGH
5 2016-12-01      A    ac              300   1000    HIGH
6 2016-12-03      A    ac              300   3455    HIGH
7 2016-12-04      A    ac              300   1234    HIGH
8 2016-12-04      B    bd             9000    600     LOW

我已将昨天的数据2016-12-04过滤掉,如下所示:

yesterday <- as.Date(Sys.Date()-1)
df2<-filter(employ.data, DATE == yesterday)
df2

            DATE Parent Child avg_child_salary salary Callout
    4 2016-12-04      A    ab              500   2000    HIGH
    7 2016-12-04      A    ac              300   1234    HIGH
    8 2016-12-04      B    bd             9000    600     LOW

我的目标是在Callout旁边添加一列,显示2016-12-04标注为HIGHLOW Child的连续天数基于employ.data数据框。这就是我需要的最终输出:

            DATE Parent Child avg_child_salary salary Callout   Consec. Days with Callout
    4 2016-12-04      A    ab              500   2000    HIGH                           2
    7 2016-12-04      A    ac              300   1234    HIGH                           2
    8 2016-12-04      B    bd             9000    600     LOW                           1

谢谢!

2 个答案:

答案 0 :(得分:2)

试试这个我的男人

library(lubridate)

df3 <- df2 %>% 
       group_by(child, callout) %>%                          
       mutate(DATE = ymd(DATE), 
              consecutive_day_flag = if_else(DATE == (lag(DATE) + days(1)), 1, 0),
              how_many = sum(consecutive_day_flag))

答案 1 :(得分:1)

这是另一种非常混乱的方法,但我认为你想要的是:

library(dplyr)
yesterday <- as.Date(Sys.Date()-1)
df2 <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>%
  filter(DATE == yesterday)
##Source: local data frame [3 x 7]
##Groups: Child [3]
##
##        DATE Parent  Child avg_child_salary salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr>                     <dbl>
##1 2016-12-04      A     ab              500   2000    HIGH                         2
##2 2016-12-04      A     ac              300   1234    HIGH                         2
##3 2016-12-04      B     bd             9000    600     LOW                         1

注意:

    如果(yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]TRUE的{​​{1}}相同,则
  1. Callout计算该行Callout的条件行yesterday的行距离与日期的天数相同。这会给出yesterday列,如下所示:

    Cond
  2. 鉴于此,我们希望从Source: local data frame [8 x 7] Groups: Child [3] DATE Parent Child avg_child_salary salary Callout Cond <date> <fctr> <fctr> <dbl> <dbl> <fctr> <lgl> 1 2016-12-01 A ab 500 1000 HIGH TRUE 2 2016-12-02 A ab 500 100 LOW FALSE 3 2016-12-03 A ab 500 4000 HIGH TRUE 4 2016-12-04 A ab 500 2000 HIGH TRUE 5 2016-12-01 A ac 300 1000 HIGH FALSE 6 2016-12-03 A ac 300 3455 HIGH TRUE 7 2016-12-04 A ac 300 1234 HIGH TRUE 8 2016-12-04 B bd 9000 600 LOW TRUE 行(按TRUE分组)向后计算连续yesterday的数量。要执行此操作,我们可以使用Child撤消向量,执行rev,一旦遇到cumprod,就会从1切换到0 ,使用FALSE再次反向向量,最后执行rev累积连续的天数。执行此操作会将cumsum列解释为与Consec. Days with Callout具有相同Callout的前一连续天数:

    yesterday
  3. 最后,按照您的Source: local data frame [8 x 7] Groups: Child [3] DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout <date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl> 1 2016-12-01 A ab 500 1000 HIGH 0 2 2016-12-02 A ab 500 100 LOW 0 3 2016-12-03 A ab 500 4000 HIGH 1 4 2016-12-04 A ab 500 2000 HIGH 2 5 2016-12-01 A ac 300 1000 HIGH 0 6 2016-12-03 A ac 300 3455 HIGH 1 7 2016-12-04 A ac 300 1234 HIGH 2 8 2016-12-04 B bd 9000 600 LOW 1 生成最终结果。