我正在尝试从出现在我数据集的前三个月内的数据框中过滤出客户ID,但是不要在前三个月结束后出现,而让我留下之前出现的两个客户ID在头三个月之后我包含一些代码来创建模拟数据集以进行说明:-
ClientId<-c('hgjj156','jksu990','ddks989','fghs676','shjk992','hddq141','huui667','kili1772','djjp8998','hdyy1122','fghs676','shjk992','hgjj156','jksu990')
DateStamp<-c('01-01-2015', '01-01-2015', '03-01-2015', '10-01-2015', '22-01-2015', '29-01-2015','05-02-2015','11-02-2015', '19-02-2015', '17-03-2015', '02-04-2015', '06-04-2015', '08-04-2015', '09-04-2015')
df<-cbind(ClientId, DateStamp)
df
哪个应该给你这个:-
ClientId DateStamp
"hgjj156" "01-01-2015"
"jksu990" "01-01-2015"
"ddks989" "03-01-2015"
"fghs676" "10-01-2015"
"shjk992" "22-01-2015"
"hddq141" "29-01-2015"
"huui667" "05-02-2015"
"kili1772" "11-02-2015"
"djjp8998" "19-02-2015"
"hdyy1122" "17-03-2015"
"fghs676" "02-04-2015"
"shjk992" "06-04-2015"
"hgjj156" "08-04-2015"
"jksu990" "09-04-2015"
我的想法是留给我以下ID:-
ClientId DateStamp
"hgjj156" "01-01-2015"
"jksu990" "01-01-2015"
"fghs676" "10-01-2015"
"shjk992" "22-01-2015"
"fghs676" "02-04-2015"
"shjk992" "06-04-2015"
"hgjj156" "08-04-2015"
"jksu990" "09-04-2015"
对我将如何实现这一目标有任何想法吗?我已经看过dplyr和data.table解决方案,但到目前为止,我还没有找到最合适的解决方案。提前非常感谢!
答案 0 :(得分:0)
在头三个月之前和之后出现的客户ID留给我
library(data.table)
# formatting
DT = as.data.table(df)
DT[, DateStamp := as.IDate(DateStamp, "%d-%m-%Y")]
# set your thresholds
d_rng = range(DT$DateStamp)
d_dn = seq(d_rng[1], by="+3 months", length.out=2)[2]
d_up = d_dn
# find ids in each window
c_dn = DT[DateStamp < d_dn, unique(ClientId)]
c_up = DT[DateStamp >= d_up, unique(ClientId)]
# filter
DT[ClientId %in% intersect(c_dn, c_up)]
ClientId DateStamp
1: hgjj156 2015-01-01
2: jksu990 2015-01-01
3: fghs676 2015-01-10
4: shjk992 2015-01-22
5: fghs676 2015-04-02
6: shjk992 2015-04-06
7: hgjj156 2015-04-08
8: jksu990 2015-04-09
我要借用@GGrothendieck对add/remove months的回答。