按组

时间:2017-03-23 21:11:55

标签: r date grouping

假设我在服务日期之后向客户收费,如果他们还没有支付账单就停止提供服务。但是,服务日期和账单日期之间的滞后使得当客户请求额外服务时难以执行。为了确定客户是否拖欠,我需要知道新请求的服务的日期是否在发送未完成的账单之后发生(这可能比服务日期晚发送)。

示例数据

df <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C", "D"), class = "factor"), service.date = structure(c(1L, 3L, 5L, 6L, 2L, 9L, 4L, 7L, 8L, 10L), .Label = c("2011-01-01", "2011-01-03", "2011-02-01", "2011-03-01", "2011-03-02", "2011-04-02", "2011-05-09", "2011-08-19", "2011-09-02", "2011-09-10"), class = "factor"), bill.date = structure(c(4L, 5L, 2L, 6L, 9L, 1L, 8L, 10L, 3L, 7L), .Label = c("2011-08-09", "2011-08-10", "2011-08-11", "2011-08-12", "2011-08-13", "2011-08-14", "2011-08-15", "2011-08-16", "2011-08-17", "2011-08-19"), class = "factor")), .Names = c("id", "service.date", "bill.date"), class = "data.frame", row.names = c(NA, -10L))

# df
# id  service.date     bill.date
# A   2011-01-01       2011-08-12
# B   2011-02-01       2011-08-13
# C   2011-03-02       2011-08-10
# D   2011-04-02       2011-08-14
# A   2011-01-03       2011-08-17
# A   2011-09-02       2011-08-09
# B   2011-03-01       2011-08-16
# C   2011-05-09       2011-08-19
# B   2011-08-19       2011-08-11
# B   2011-09-10       2011-08-15

因此,如果他们在将账单发送给他们的初始服务之前要求提供额外的服务,那么他们就不会被视为欠款。但是,如果他们在法案签发后要求额外的服务并且仍未支付,那么他们将被拖欠。

目前为止的步骤 我的想法是使用分组函数,可能像by(),找到与因子变量“id”中的级别相关联的第一个“bill.date”,然后确定与每个“service.date”关联的每个“id”级别,如果它发生在所述“id”级别的相关未完成“bill.date”之后,最终创建一个逻辑变量。以下是我最终想要的样本:

期望的结果

df$delinquent <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)

#df

# id    service.date    bill.date   delinquent
# A     2011-01-01      2011-08-12   FALSE
# B     2011-02-01      2011-08-13   FALSE
# C     2011-03-02      2011-08-10   FALSE
# D     2011-04-02      2011-08-14   FALSE
# A     2011-01-03      2011-08-17   FALSE
# A     2011-09-02      2011-08-09   TRUE
# B     2011-03-01      2011-08-16   FALSE
# C     2011-05-09      2011-08-19   FALSE
# B     2011-08-19      2011-08-11   TRUE
# B     2011-09-10      2011-08-15   TRUE

因此,在样本数据中,有四个“客户”(名为A,B,C和D),其中两个将被标记为拖欠(A和B),以获得服务,尽管有未付账单。< / p>

2 个答案:

答案 0 :(得分:3)

# Load some tidyverse libraries
require(dplyr)

# Convert factor dates to actual dates
df <- df %>% mutate(service.date = as.Date(service.date),
                    bill.date = as.Date(bill.date))

# If service date is later than earliest bill.date in each group, return delinquent
df %>% group_by(id) %>% mutate(delinquent = service.date > min(bill.date))

答案 1 :(得分:2)

如何使用data.table:

library(data.table)

dt<-as.data.table(df)
dt[order(as.Date(service.date),as.Date(bill.date)),
   delinquent:=(cumsum(as.Date(service.date)>=as.Date(bill.date))>=1L),
   by=id]


#    id service.date  bill.date delinquent
# 1:  A   2011-01-01 2011-08-12      FALSE
# 2:  B   2011-02-01 2011-08-13      FALSE
# 3:  C   2011-03-02 2011-08-10      FALSE
# 4:  D   2011-04-02 2011-08-14      FALSE
# 5:  A   2011-01-03 2011-08-17      FALSE
# 6:  A   2011-09-02 2011-08-09       TRUE
# 7:  B   2011-03-01 2011-08-16      FALSE
# 8:  C   2011-05-09 2011-08-19      FALSE
# 9:  B   2011-08-19 2011-08-11       TRUE
#10:  B   2011-09-10 2011-08-15       TRUE

这假设你想要考虑拖欠他们的人,如果他们在过去至少拖欠过一次。

编辑:一种方法,无需任何排序,受到@Vlo的启发:

dt[,delinquent:=as.Date(service.date)>=min(as.Date(bill.date)),by=id]