SQL Spark sum和group multiple columns

时间:2018-05-06 12:46:05

标签: apache-spark-sql

我有一个表格,其中包含“customerId”列和“分类:每周买家,每月买家和年度买家”栏目。

基于最后一个订单之间的订单天数的分类条件已经完成。

select customerId,  dayofyear(date) - lag(dayofyear(date)) over(partition by customerId order by date) as Differenz, 
case when (dayofyear(date) - lag(dayofyear(date)) over(partition by customerId order by date)) <= 9 then "weekly buyer"
      when (dayofyear(date) - lag(dayofyear(date)) over(partition by customerId order by date)) between 10 and 19 then "every two weeks"
      when (dayofyear(date) - lag(dayofyear(date)) over(partition by customerId order by date)) between 20 and 40 then "monthly buyer"
      when (dayofyear(date) - lag(dayofyear(date)) over(partition by customerId order by date)) > 40 then "occastional buyer"
      else null
      end as Einstufung
from orders 
order by customerId, date 

我想知道客户是每周,每两周,每月还是偶然买家。

客户可以多次订购,因此customerId =每周,每周和每月订单1次。总而言之,他是一个狡猾的买家。我怎样才能在sql中定义它?

我简化了excel中的表格:

enter image description here

0 个答案:

没有答案