在R中满足条件后,筛选组中的后续行

时间:2017-06-26 14:29:01

标签: r filter subset

对于以下样本数据集,我需要在第一次购买(CustomerStatus =已购买)后删除客户(CustomerID)的任何行。有些客户不购买该产品,我仍然希望保留对这些客户的任何观察。日期变量很重要。

我在删除组内的行时遇到困难。原始数据的分组不如此,我试图简化我遇到的问题。任何帮助表示赞赏。

我提供了一个样本数据集:

SalesPerson  CustomerID  Date       CustomerStatus
Amanda       2000       1/5/2017    Intro
Amanda       2000       1/6/2017    Email
Amanda       2000       1/15/2017   PhoneCall
Amanda       2000       2/15/2017   Purchased
Amanda       2001       1/3/2017    Intro
Amanda       2001       1/4/2017    Email
Amanda       2001       1/12/2017   PhoneCall
Amanda       2001       1/15/2017   Conference
Amanda       2001       2/4/2017    Purchased
Amanda       2001       3/17/2017   Meeting
Amanda       2001       3/20/2017   Email
Kyle         2002       1/19/2017   Intro
Kyle         2002       1/20/2017   Email
Kyle         2002       1/21/2017   PhoneCall
Sharon       2006       1/8/2017    Intro
Sharon       2006       1/10/2017   Meeting
Sharon       2006       1/19/2017   Purchased
Sharon       2006       1/30/2017   Conference
Sharon       2006       2/10/2017   Purchased

输出应该是这样的:

SalesPerson  CustomerID  Date       CustomerStatus
Amanda       2000       1/5/2017    Intro
Amanda       2000       1/6/2017    Email
Amanda       2000       1/15/2017   PhoneCall
Amanda       2000       2/15/2017   Purchased
Amanda       2001       1/3/2017    Intro
Amanda       2001       1/4/2017    Email
Amanda       2001       1/12/2017   PhoneCall
Amanda       2001       1/15/2017   Conference
Amanda       2001       2/4/2017    Purchased
Kyle         2002       1/19/2017   Intro
Kyle         2002       1/20/2017   Email
Kyle         2002       1/21/2017   PhoneCall
Sharon       2006       1/8/2017    Intro
Sharon       2006       1/10/2017   Meeting
Sharon       2006       1/19/2017   Purchased

1 个答案:

答案 0 :(得分:2)

我们可以按'SalesPerson','CustomerID'进行分组,为filter创建逻辑索引

library(dplyr)
df1 %>%
     group_by(SalesPerson, CustomerID) %>% 
     filter(cumsum(lag(CustomerStatus == "Purchased", default = FALSE))<1)
# A tibble: 15 x 4
# Groups:   SalesPerson, CustomerID [4]
#   SalesPerson CustomerID      Date CustomerStatus
#         <chr>      <int>     <chr>          <chr>
# 1      Amanda       2000  1/5/2017          Intro
# 2      Amanda       2000  1/6/2017          Email
# 3      Amanda       2000 1/15/2017      PhoneCall
# 4      Amanda       2000 2/15/2017      Purchased
# 5      Amanda       2001  1/3/2017          Intro
# 6      Amanda       2001  1/4/2017          Email
# 7      Amanda       2001 1/12/2017      PhoneCall
# 8      Amanda       2001 1/15/2017     Conference
# 9      Amanda       2001  2/4/2017      Purchased
#10        Kyle       2002 1/19/2017          Intro
#11        Kyle       2002 1/20/2017          Email
#12        Kyle       2002 1/21/2017      PhoneCall
#13      Sharon       2006  1/8/2017          Intro
#14      Sharon       2006 1/10/2017        Meeting
#15      Sharon       2006 1/19/2017      Purchased