Question

我的客户评分数据如下：

cust_id  score_date   score
   1       5/1/2016    80
   1       5/2/2016    83
   1       5/22/2016   90
   2       6/1/2016    92
   2       7/2/2016    87

我想检查客户的分数趋势;这意味着，我想检查一下客户的得分是否随着时间的推移而增加（积极趋势）。

我想过使用这样的东西（dplyr）：

results <- df %>% 
           group_by(cust_id) %>%
           .[order(-.[, 2]), ]

但我不太确定如何检查分数的差异。

我希望我的答案能够计算出积极趋势的客户数量;类似的东西：

      positive_trend (number of customers)
yes       1,000
no         78

您的帮助将不胜感激

Answer 1

使用dplyr。对于每个cust_id，我们使用diff然后summarise计算连续行之间的差异，以计算正值和负值的数量。

library(dplyr)
df %>%
  group_by(cust_id) %>%
  mutate(difference = c(0, diff(score))) %>%
  summarise(yes = sum(difference > 0), 
            no = sum(difference < 0))


#   cust_id   yes    no
#   <int>   <int>  <int>
#1    1       2      0
#2    2       0      1

注意：根据此代码，每个组中的第一行将被忽略，因为开头没有趋势。

Answer 2

我们可以使用data.table

执行此操作

library(data.table)
setDT(df)[,  as.list(table(factor(diff(score)>0, levels = c(TRUE, FALSE),
                                labels = c("yes", "no")))), cust_id]
#   cust_id yes no
#1:       1   2  0
#2:       2   0  1

或使用base R

table(transform(stack(with(df, tapply(score, cust_id,
                    FUN = diff)))[2:1], values = values > 0))

检查分数趋势

2 个答案: