我需要一个变量“ minus_180_days” /(计数器)以升序编号:
您第一次访问
如果第二次的时间少于180天(与患者的上次访视相比);如果不符合180天的条件,则在第二次访问中还必须显示1;
如果第三次访问的前一次访问少于180天(访问“ 2”),如果不满足180天的条件,则第三次访问为1,依此类推。
< / li>数据
pacient <- c(10,10,10,10,10,11,11,12,12,12,13, 13, 15, 14); pacient
date <- as.Date(c("01/01/2018","02/05/2018", "04/06/2018", "10/11/2019", "05/12/2018", "02/01/2018", "06/08/2018", "01/01/2018", "03/01/2018", "06/03/2018", "05/08/2018", "05/08/2019", "05/07/2019", "08/07/2017"), format = "%d/%m/%Y"); date
DF <- data.frame(pacient, date); DF
我有这个代码
DF <- DF %>%
group_by(pacient) %>%
arrange(date) %>%
mutate(days_visit = date - lag(date, default = first(date)))
days_visit <- as.integer(DF$days_visit)
DF <- DF[with(DF,order(pacient,date)),]
答案 0 :(得分:4)
dplyr解决方案,已更新以反映@Gregor的有用评论:
DF2 <- DF %>%
group_by(pacient) %>%
arrange(pacient, date) %>%
mutate(days_visit = (date - lag(date, default = first(date))) %>% as.integer,
new_count = cumsum(days_visit > 180) + 1) %>%
group_by(pacient, new_count) %>%
mutate(vis_num = row_number(),
counter = case_when(vis_num == 1 ~ 1L,
days_visit < 180 ~ vis_num,
TRUE ~ 1L))
> DF
# A tibble: 14 x 5
# Groups: pacient [6]
pacient date days_visit vis_num counter
<dbl> <date> <int> <int> <int>
1 10 2018-01-01 0 1 1
2 10 2018-05-02 121 2 2
3 10 2018-06-04 33 3 3
4 10 2018-12-05 184 4 1
5 10 2019-11-10 340 5 1
6 11 2018-01-02 0 1 1
7 11 2018-08-06 216 2 1
8 12 2018-01-01 0 1 1
9 12 2018-01-03 2 2 2
10 12 2018-03-06 62 3 3
11 13 2018-08-05 0 1 1
12 13 2019-08-05 365 2 1
13 14 2017-07-08 0 1 1
14 15 2019-07-05 0 1 1
答案 1 :(得分:3)
基于tidyverse的更简洁的方法(改编自@Gregor),包括对@Gregor指出的错误的修复。
DF %>%
arrange(pacient, date) %>%
group_by(pacient) %>%
mutate(days_visit = as.integer(date - lag(date, default = first(date))) ,
less_180 = days_visit < 180,
counter = ave(less_180, cumsum(less_180 == 0), FUN = seq_along))
# A tibble: 17 x 5
# Groups: pacient [6]
pacient date days_visit less_180 counter
<dbl> <date> <int> <dbl> <dbl>
1 10 2018-01-01 0 1 1
2 10 2018-05-02 121 1 2
3 10 2018-06-04 33 1 3
4 10 2018-12-05 184 0 1
5 10 2019-11-10 340 0 1
6 10 2019-11-11 1 1 2
7 10 2019-11-12 1 1 3
8 10 2019-11-13 1 1 4
9 11 2018-01-02 0 1 1
10 11 2018-08-06 216 0 1
11 12 2018-01-01 0 1 1
12 12 2018-01-03 2 1 2
13 12 2018-03-06 62 1 3
14 13 2018-08-05 0 1 1
15 13 2019-08-05 365 0 1
16 14 2017-07-08 0 1 1
17 15 2019-07-05 0 1 1
答案 2 :(得分:2)
这似乎可行:
library(data.table)
setDT(DF)
setorder(DF, pacient, date)
DF[, v := rowid(pacient, cumsum(date - shift(date, fill=first(date)) > 180))]
pacient date v
1: 10 2018-01-01 1
2: 10 2018-05-02 2
3: 10 2018-06-04 3
4: 10 2018-12-05 1
5: 10 2019-11-10 1
6: 11 2018-01-02 1
7: 11 2018-08-06 1
8: 12 2018-01-01 1
9: 12 2018-01-03 2
10: 12 2018-03-06 3
11: 13 2018-08-05 1
12: 13 2019-08-05 1
13: 14 2017-07-08 1
14: 15 2019-07-05 1
使用Gregor的更高级数据进行测试...
pacient2 <- c(10,10,10,10,10,10,10,10,11,11,12,12,12,13, 13, 15, 14)
date2 <- as.Date(c("01/01/2018","02/05/2018", "04/06/2018", "10/11/2019", "11/11/2019", "12/11/2019", "13/11/2019", "05/12/2018", "02/01/2018", "06/08/2018", "01/01/2018", "03/01/2018", "06/03/2018", "05/08/2018", "05/08/2019", "05/07/2019", "08/07/2017"), format = "%d/%m/%Y")
DF2 <- data.frame(pacient = pacient2, date = date2)
library(data.table)
setDT(DF2)
setorder(DF2, pacient, date)
DF2[, v := rowid(pacient, cumsum(date - shift(date, fill=first(date)) > 180))]
pacient date v
1: 10 2018-01-01 1
2: 10 2018-05-02 2
3: 10 2018-06-04 3
4: 10 2018-12-05 1
5: 10 2019-11-10 1
6: 10 2019-11-11 2
7: 10 2019-11-12 3
8: 10 2019-11-13 4
9: 11 2018-01-02 1
10: 11 2018-08-06 1
11: 12 2018-01-01 1
12: 12 2018-01-03 2
13: 12 2018-03-06 3
14: 13 2018-08-05 1
15: 13 2019-08-05 1
16: 14 2017-07-08 1
17: 15 2019-07-05 1
我得到了不同的结果,但这似乎是有道理的。让我知道是否有问题,任何人。