查找已排序的前n个交易的平均值

时间:2017-10-18 08:31:01

标签: r

让我的数据框有2列,客户ID & 交易金额。现在,对于每个唯一的客户ID,我想找到交易金额(按降序排序)&然后从排序列中我将找到排序列表的前三个交易的平均交易金额。

Cust_id     trans_amount
12345          100      
12345          200      
12345          170      
12345          300      
12345          250
12456          140        
12456          240       
12456          160       
12456          100          

我正在寻找的格式是,

Cust_id     trans_amount
12345          300               
12345          250      
12345          200      
12345          170      
12345          100
12456          240        
12456          160       
12456          140       
12456          100          

并从那里得到前三名的意思,即

Cust_id    mean_for_top_3
12345         250
12456         180

对于中间部分,我试过了,

ddply(cust_data,.(cust_id.),summarize,sorted_amount=sort(trans_amount,,decreasing=TRUE))

但没有得到结果。请告知我如何达到我想要的输出。

2 个答案:

答案 0 :(得分:3)

使用data.table的解决方案:

library(data.table)
setDT(cust_data)
cust_data_sort <- cust_data[, .(trans_amount = sort(trans_amount, decreasing = TRUE)), Cust_id]
cust_data_sort[, .(mean_for_top_3 = mean(head(trans_amount, 3))), Cust_id]
   Cust_id mean_for_top_3
1:   12345            250
2:   12456            180

如果您不需要排序表cust_data_sort,那么您可以使用它来表达意思:

cust_data[, .(mean_for_top_3 = mean(head(sort(trans_amount, decreasing = TRUE), 3))), Cust_id]

答案 1 :(得分:1)

使用dplyr

的惯用解决方案
df <- read.table(text = "Cust_id     trans_amount
12345          100      
                 12345          200      
                 12345          170      
                 12345          300      
                 12345          250
                 12456          140        
                 12456          240       
                 12456          160       
                 12456          100    ", header  = T)


library(dplyr)

df %>% group_by(Cust_id) %>% 
  arrange(desc(trans_amount), .by_group = T) %>%
  top_n(n = 3) %>%
  summarize(mean = mean(trans_amount))

# A tibble: 2 x 2
  Cust_id  mean
    <int> <dbl>
1   12345   250
2   12456   180

替代计数:

  > df %>% group_by(Cust_id) %>% 
+   #arrange(desc(trans_amount), .by_group = T) %>% 
+   mutate(count = n()) %>%
+   top_n(n = 3, wt = trans_amount) %>%
+   mutate(mean = mean(trans_amount)) %>%
+   select(Cust_id,mean,count) %>% distinct()
# A tibble: 2 x 3
# Groups:   Cust_id [2]
  Cust_id  mean count
    <int> <dbl> <int>
1   12345   250     5
2   12456   180     4
>