我有以下数据集:
id code date charge
1 AAA 01jan2016 23
1 BBB 20jan2016 45
1 CCC 19feb2018 23
1 DDD 20jan2019 123
1 EEE 02jan2016 43
1 FFF 12dec2015 12
2 AAA 07jan2017 12
2 BBB 08jan2017 32
2 CCC 06jan2017 12
2 DDD 10oct2019 12
3 AAA 12dec2014 12
3 BBB 18dec2014 12
3 CCC 01dec2014 13
如何保存代码-30
的{{1}}到+90
天内的所有记录?
这是我期望的输出:
AAA
我尝试使用日期过滤器,但是id code date charge
1 AAA 01jan2016 23
1 BBB 20jan2016 45
1 EEE 02jan2016 43
1 FFF 12dec2015 12
2 AAA 07jan2017 12
2 BBB 08jan2017 32
2 CCC 06jan2017 12
3 AAA 12dec2014 12
3 BBB 18dec2014 12
3 CCC 01dec2014 13
的日期对于所有ID来说都不同,所以它不起作用。
答案 0 :(得分:1)
一种选择是首先将“日期”转换为Date
类(mdy
-从lubridate
),然后按“ ID”分组,检查“日期”值是否between
是“代码”为“ AAA”的“日期”之前的30天,以及该“日期”之后的90天之内
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date = mdy(Date)) %>%
group_by(ID) %>%
filter(between(Date, min(Date[Code == "AAA"]) - days(30),
min(Date[Code == "AAA"]) + days(90)))
# A tibble: 10 x 4
# Groups: ID [3]
# ID Code Date Charge
# <int> <chr> <date> <dbl>
# 1 1 AAA 2016-01-01 23
# 2 1 BBB 2016-01-20 45
# 3 1 EEE 2016-01-02 43
# 4 1 FFF 2015-12-12 12
# 5 2 AAA 2017-01-07 12
# 6 2 BBB 2017-01-08 32
# 7 2 CCC 2017-01-06 12
# 8 3 AAA 2014-12-12 12
# 9 3 BBB 2014-12-18 12
#10 3 CCC 2014-12-01 13
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L), Code = c("AAA", "BBB", "CCC", "DDD", "EEE", "FFF",
"AAA", "BBB", "CCC", "DDD", "AAA", "BBB", "CCC"), Date = c("1/1/2016",
"1/20/2016", "2/19/2018", "1/20/2019", "1/2/2016", "12/12/2015",
"1/7/2017", "1/8/2017", "1/6/2017", "10/10/2019", "12/12/2014",
"12/18/2014", "12/1/2014"), Charge = c(23, 45, 23, 123, 43, 12,
12, 32, 12, 12, 12, 12, 13)), class = "data.frame", row.names = c(NA,
-13L))
答案 1 :(得分:1)
以下是Stata解决方案:
clear
input byte id str3 code float date int charge
1 "AAA" 20454 23
1 "BBB" 20473 45
1 "CCC" 21234 23
1 "DDD" 21569 123
1 "EEE" 20455 43
1 "FFF" 20434 12
2 "AAA" 20826 12
2 "BBB" 20827 32
2 "CCC" 20825 12
2 "DDD" 21832 12
3 "AAA" 20069 12
3 "BBB" 20075 12
3 "CCC" 20058 13
end
format %td date
bysort id (code date): generate delta = date - date[1]
keep if delta >= -30 & delta <= 90
结果:
list, sepby(id)
+----------------------------------------+
| id code date charge delta |
|----------------------------------------|
1. | 1 AAA 01jan2016 23 0 |
2. | 1 BBB 20jan2016 45 19 |
3. | 1 EEE 02jan2016 43 1 |
4. | 1 FFF 12dec2015 12 -20 |
|----------------------------------------|
5. | 2 AAA 07jan2017 12 0 |
6. | 2 BBB 08jan2017 32 1 |
7. | 2 CCC 06jan2017 12 -1 |
|----------------------------------------|
8. | 3 AAA 12dec2014 12 0 |
9. | 3 BBB 18dec2014 12 6 |
10. | 3 CCC 01dec2014 13 -11 |
+----------------------------------------+