在指定的日期范围之间过滤出ID

时间:2018-10-22 15:46:25

标签: r date filter id

我正在尝试从出现在我数据集的前三个月内的数据框中过滤出客户ID,但是不要在前三个月结束后出现,而让我留下之前出现的两个客户ID在头三个月之后我包含一些代码来创建模拟数据集以进行说明:-

    ClientId<-c('hgjj156','jksu990','ddks989','fghs676','shjk992','hddq141','huui667','kili1772','djjp8998','hdyy1122','fghs676','shjk992','hgjj156','jksu990')

    DateStamp<-c('01-01-2015', '01-01-2015', '03-01-2015', '10-01-2015', '22-01-2015', '29-01-2015','05-02-2015','11-02-2015', '19-02-2015', '17-03-2015', '02-04-2015', '06-04-2015', '08-04-2015', '09-04-2015')

    df<-cbind(ClientId, DateStamp)
    df

哪个应该给你这个:-

  ClientId   DateStamp   
 "hgjj156"  "01-01-2015"
 "jksu990"  "01-01-2015"
 "ddks989"  "03-01-2015"
 "fghs676"  "10-01-2015"
 "shjk992"  "22-01-2015"
 "hddq141"  "29-01-2015"
 "huui667"  "05-02-2015"
 "kili1772" "11-02-2015"
 "djjp8998" "19-02-2015"
 "hdyy1122" "17-03-2015"
 "fghs676"  "02-04-2015"
 "shjk992"  "06-04-2015"
 "hgjj156"  "08-04-2015"
 "jksu990"  "09-04-2015"

我的想法是留给我以下ID:-

    ClientId   DateStamp
  "hgjj156"  "01-01-2015"
  "jksu990"  "01-01-2015"
  "fghs676"  "10-01-2015"
  "shjk992"  "22-01-2015"
  "fghs676"  "02-04-2015"
  "shjk992"  "06-04-2015"
  "hgjj156"  "08-04-2015"
  "jksu990"  "09-04-2015"

对我将如何实现这一目标有任何想法吗?我已经看过dplyr和data.table解决方案,但到目前为止,我还没有找到最合适的解决方案。提前非常感谢!

1 个答案:

答案 0 :(得分:0)

  

在头三个月之前和之后出现的客户ID留给我

library(data.table)

# formatting
DT = as.data.table(df)
DT[, DateStamp := as.IDate(DateStamp, "%d-%m-%Y")]

# set your thresholds
d_rng = range(DT$DateStamp)
d_dn = seq(d_rng[1], by="+3 months", length.out=2)[2]
d_up = d_dn

# find ids in each window
c_dn = DT[DateStamp < d_dn, unique(ClientId)]
c_up = DT[DateStamp >= d_up, unique(ClientId)]

# filter
DT[ClientId %in% intersect(c_dn, c_up)]

   ClientId  DateStamp
1:  hgjj156 2015-01-01
2:  jksu990 2015-01-01
3:  fghs676 2015-01-10
4:  shjk992 2015-01-22
5:  fghs676 2015-04-02
6:  shjk992 2015-04-06
7:  hgjj156 2015-04-08
8:  jksu990 2015-04-09

我要借用@GGrothendieck对add/remove months的回答。