我有一个客户付款日期列表,我正在查看付款是否有7/14天或每月模式,通常有!问题是,也可能存在类似价值的中间付款,因此只需查看付款之间的时间并不总是有效。 是否有任何简单的方法(使用SQL或R)可以帮助我将客户分类为每周或每月付款人?
看起来像一个非常简单的信号处理问题,但也许我不知道google的正确单词,因为我找不到任何东西。任何指向我正确的方向将不胜感激!
示例数据:
CustomerID Payment Date
Customer1 2017-01-05
Customer1 2017-01-06
Customer1 2017-01-12
Customer1 2017-01-17
Customer1 2017-01-19
Customer1 2017-01-19
Customer1 2017-01-26
Customer1 2017-02-02
Customer1 2017-02-03
Customer2 2017-06-04
Customer2 2017-06-06
Customer2 2017-07-04
Customer2 2017-07-06
Customer2 2017-07-22
Customer2 2017-07-28
Customer2 2017-08-06
示例输出
CustomerID Classification
Customer1 Weekly
Customer2 Monthly
编辑:为了清楚起见,数据通常要大得多,而且比上面的噪音要大。我只是在寻找找到模式的算法的一般想法,而不是尝试解决我发布的小数据集的问题。
答案 0 :(得分:0)
payment_date <-
as.Date(
c(
"2017-01-05",
"2017-01-06",
"2017-01-12",
"2017-01-17",
"2017-01-19",
"2017-01-19",
"2017-01-26",
"2017-02-02",
"2017-02-03",
"2017-06-04",
"2017-06-06",
"2017-07-04",
"2017-07-06",
"2017-07-22",
"2017-07-28",
"2017-08-06"
)
)
df <- data.frame(payment_date,
customer_id = 0)
df$customer_id[1:9] <- 1
df$customer_id[10:16] <- 2
customer_information <- data.frame(customer_id = numeric(),
payment = character())
for (i in 1:length(unique(df$customer_id))) {
delta_t <-
abs(as.numeric(df$payment_date[(df$customer_id == i) &
(!duplicated(df$customer_id))] - df$payment_date[(df$customer_id == i) &
(!duplicated(df$customer_id, fromLast = TRUE))]))
nr_of_payments <- NROW(df[df$customer_id == i,])
days_to_pay <- delta_t / nr_of_payments
if (days_to_pay > 7) {
to_add <- data.frame(customer_id = i,
payment = "monthly")
customer_information <- rbind(customer_information, to_add)
} else{
to_add <- data.frame(customer_id = i,
payment = "weekly")
customer_information <- rbind(customer_information, to_add)
}
}
代码使用客户付款的平均时间。如果平均时间超过7,他就是每月付款人,否则他就是每周付款人。
它有效,但我认为这不是一个令人满意的解决方案。似乎每个月/每周有两笔付款。如果是这种情况,您可以考虑获得更准确的结果。