R:有条件的柜台

时间:2019-04-05 18:42:32

标签: r dplyr counter

我需要一个变量“ minus_180_days” /(计数器)以升序编号:

  1. 您第一次访问

  2. 如果第二次的时间少于180天(与患者的上次访视相比);如果不符合180天的条件,则在第二次访问中还必须显示1;

  3. 如果第三次访问的前一次访问少于180天(访问“ 2”),如果不满足180天的条件,则第三次访问为1,依此类推。

    < / li>

数据

pacient <- c(10,10,10,10,10,11,11,12,12,12,13, 13, 15, 14); pacient
date <- as.Date(c("01/01/2018","02/05/2018", "04/06/2018", "10/11/2019", "05/12/2018", "02/01/2018", "06/08/2018", "01/01/2018", "03/01/2018", "06/03/2018", "05/08/2018", "05/08/2019", "05/07/2019", "08/07/2017"), format = "%d/%m/%Y"); date 
DF <- data.frame(pacient, date); DF

我有这个代码

DF <- DF %>%
  group_by(pacient) %>%
  arrange(date) %>%
  mutate(days_visit = date - lag(date, default = first(date))) 
days_visit <- as.integer(DF$days_visit) 

DF <- DF[with(DF,order(pacient,date)),]

我需要的输出(预期的输出the output that I need

3 个答案:

答案 0 :(得分:4)

dplyr解决方案,已更新以反映@Gregor的有用评论:

DF2 <- DF %>%
  group_by(pacient) %>%
  arrange(pacient, date) %>%
  mutate(days_visit = (date - lag(date, default = first(date))) %>% as.integer,
         new_count = cumsum(days_visit > 180) + 1) %>%
  group_by(pacient, new_count) %>%
  mutate(vis_num = row_number(),
         counter = case_when(vis_num == 1      ~ 1L,
                             days_visit < 180  ~ vis_num,
                             TRUE              ~ 1L))
> DF
# A tibble: 14 x 5
# Groups:   pacient [6]
   pacient date       days_visit vis_num counter
     <dbl> <date>          <int>   <int>   <int>
 1      10 2018-01-01          0       1       1
 2      10 2018-05-02        121       2       2
 3      10 2018-06-04         33       3       3
 4      10 2018-12-05        184       4       1
 5      10 2019-11-10        340       5       1
 6      11 2018-01-02          0       1       1
 7      11 2018-08-06        216       2       1
 8      12 2018-01-01          0       1       1
 9      12 2018-01-03          2       2       2
10      12 2018-03-06         62       3       3
11      13 2018-08-05          0       1       1
12      13 2019-08-05        365       2       1
13      14 2017-07-08          0       1       1
14      15 2019-07-05          0       1       1

答案 1 :(得分:3)

基于tidyverse的更简洁的方法(改编自@Gregor),包括对@Gregor指出的错误的修复。

DF %>%
  arrange(pacient, date) %>%
  group_by(pacient) %>%
  mutate(days_visit = as.integer(date - lag(date, default = first(date))) ,
         less_180 = days_visit < 180,
         counter = ave(less_180, cumsum(less_180 == 0), FUN = seq_along)) 

# A tibble: 17 x 5
# Groups:   pacient [6]
   pacient date       days_visit less_180 counter
     <dbl> <date>          <int>    <dbl>   <dbl>
 1      10 2018-01-01          0        1       1
 2      10 2018-05-02        121        1       2
 3      10 2018-06-04         33        1       3
 4      10 2018-12-05        184        0       1
 5      10 2019-11-10        340        0       1
 6      10 2019-11-11          1        1       2
 7      10 2019-11-12          1        1       3
 8      10 2019-11-13          1        1       4
 9      11 2018-01-02          0        1       1
10      11 2018-08-06        216        0       1
11      12 2018-01-01          0        1       1
12      12 2018-01-03          2        1       2
13      12 2018-03-06         62        1       3
14      13 2018-08-05          0        1       1
15      13 2019-08-05        365        0       1
16      14 2017-07-08          0        1       1
17      15 2019-07-05          0        1       1

答案 2 :(得分:2)

这似乎可行:

library(data.table)
setDT(DF)
setorder(DF, pacient, date)

DF[, v := rowid(pacient, cumsum(date - shift(date, fill=first(date)) > 180))]

    pacient       date v
 1:      10 2018-01-01 1
 2:      10 2018-05-02 2
 3:      10 2018-06-04 3
 4:      10 2018-12-05 1
 5:      10 2019-11-10 1
 6:      11 2018-01-02 1
 7:      11 2018-08-06 1
 8:      12 2018-01-01 1
 9:      12 2018-01-03 2
10:      12 2018-03-06 3
11:      13 2018-08-05 1
12:      13 2019-08-05 1
13:      14 2017-07-08 1
14:      15 2019-07-05 1

使用Gregor的更高级数据进行测试...

pacient2 <- c(10,10,10,10,10,10,10,10,11,11,12,12,12,13, 13, 15, 14)
date2 <- as.Date(c("01/01/2018","02/05/2018", "04/06/2018", "10/11/2019", "11/11/2019", "12/11/2019", "13/11/2019", "05/12/2018", "02/01/2018", "06/08/2018", "01/01/2018", "03/01/2018", "06/03/2018", "05/08/2018", "05/08/2019", "05/07/2019", "08/07/2017"), format = "%d/%m/%Y")
DF2 <- data.frame(pacient = pacient2, date = date2)

library(data.table)
setDT(DF2)
setorder(DF2, pacient, date)

DF2[, v := rowid(pacient, cumsum(date - shift(date, fill=first(date)) > 180))]

    pacient       date v
 1:      10 2018-01-01 1
 2:      10 2018-05-02 2
 3:      10 2018-06-04 3
 4:      10 2018-12-05 1
 5:      10 2019-11-10 1
 6:      10 2019-11-11 2
 7:      10 2019-11-12 3
 8:      10 2019-11-13 4
 9:      11 2018-01-02 1
10:      11 2018-08-06 1
11:      12 2018-01-01 1
12:      12 2018-01-03 2
13:      12 2018-03-06 3
14:      13 2018-08-05 1
15:      13 2019-08-05 1
16:      14 2017-07-08 1
17:      15 2019-07-05 1

我得到了不同的结果,但这似乎是有道理的。让我知道是否有问题,任何人。