根据代码和日期过滤数据

时间:2019-09-17 20:54:36

标签: r excel stata

我有以下数据集:

id   code        date   charge  
 1    AAA   01jan2016       23  
 1    BBB   20jan2016       45  
 1    CCC   19feb2018       23  
 1    DDD   20jan2019      123  
 1    EEE   02jan2016       43  
 1    FFF   12dec2015       12  
 2    AAA   07jan2017       12  
 2    BBB   08jan2017       32  
 2    CCC   06jan2017       12  
 2    DDD   10oct2019       12  
 3    AAA   12dec2014       12  
 3    BBB   18dec2014       12  
 3    CCC   01dec2014       13  

如何保存代码-30的{​​{1}}到+90天内的所有记录?

这是我期望的输出:

AAA

我尝试使用日期过滤器,但是id code date charge 1 AAA 01jan2016 23 1 BBB 20jan2016 45 1 EEE 02jan2016 43 1 FFF 12dec2015 12 2 AAA 07jan2017 12 2 BBB 08jan2017 32 2 CCC 06jan2017 12 3 AAA 12dec2014 12 3 BBB 18dec2014 12 3 CCC 01dec2014 13 的日期对于所有ID来说都不同,所以它不起作用。

2 个答案:

答案 0 :(得分:1)

一种选择是首先将“日期”转换为Date类(mdy-从lubridate),然后按“ ID”分组,检查“日期”值是否between是“代码”为“ AAA”的“日期”之前的30天,以及该“日期”之后的90天之内

library(dplyr)
library(lubridate)
df1 %>%
   mutate(Date = mdy(Date)) %>%
    group_by(ID) %>%
    filter(between(Date, min(Date[Code == "AAA"]) - days(30),
             min(Date[Code == "AAA"]) + days(90)))
# A tibble: 10 x 4
# Groups:   ID [3]
#      ID Code  Date       Charge
#   <int> <chr> <date>      <dbl>
# 1     1 AAA   2016-01-01     23
# 2     1 BBB   2016-01-20     45
# 3     1 EEE   2016-01-02     43
# 4     1 FFF   2015-12-12     12
# 5     2 AAA   2017-01-07     12
# 6     2 BBB   2017-01-08     32
# 7     2 CCC   2017-01-06     12
# 8     3 AAA   2014-12-12     12
# 9     3 BBB   2014-12-18     12
#10     3 CCC   2014-12-01     13

数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L), Code = c("AAA", "BBB", "CCC", "DDD", "EEE", "FFF", 
"AAA", "BBB", "CCC", "DDD", "AAA", "BBB", "CCC"), Date = c("1/1/2016", 
"1/20/2016", "2/19/2018", "1/20/2019", "1/2/2016", "12/12/2015", 
"1/7/2017", "1/8/2017", "1/6/2017", "10/10/2019", "12/12/2014", 
"12/18/2014", "12/1/2014"), Charge = c(23, 45, 23, 123, 43, 12, 
12, 32, 12, 12, 12, 12, 13)), class = "data.frame", row.names = c(NA, 
-13L))

答案 1 :(得分:1)

以下是Stata解决方案:

clear
input byte id str3 code float date int charge
1 "AAA" 20454  23
1 "BBB" 20473  45
1 "CCC" 21234  23
1 "DDD" 21569 123
1 "EEE" 20455  43
1 "FFF" 20434  12
2 "AAA" 20826  12
2 "BBB" 20827  32
2 "CCC" 20825  12
2 "DDD" 21832  12
3 "AAA" 20069  12
3 "BBB" 20075  12
3 "CCC" 20058  13
end
format %td date

bysort id (code date): generate delta = date - date[1]
keep if delta >= -30 & delta <= 90

结果:

list, sepby(id)

     +----------------------------------------+
     | id   code        date   charge   delta |
     |----------------------------------------|
  1. |  1    AAA   01jan2016       23       0 |
  2. |  1    BBB   20jan2016       45      19 |
  3. |  1    EEE   02jan2016       43       1 |
  4. |  1    FFF   12dec2015       12     -20 |
     |----------------------------------------|
  5. |  2    AAA   07jan2017       12       0 |
  6. |  2    BBB   08jan2017       32       1 |
  7. |  2    CCC   06jan2017       12      -1 |
     |----------------------------------------|
  8. |  3    AAA   12dec2014       12       0 |
  9. |  3    BBB   18dec2014       12       6 |
 10. |  3    CCC   01dec2014       13     -11 |
     +----------------------------------------+