使用Postgresql进行RFM分析

时间:2017-03-24 02:47:31

标签: sql postgresql heroku heroku-postgres

我正在尝试使用Postgresql查询创建RFM分析。但是,我还没有完全完成对Recency维度的查询。 该文章的灵感来自于本文 “https://cooldata.wordpress.com/2014/03/25/an-all-sql-way-to-automate-rfm-scoring/
新近度维度的标准是

  1. 2个月内的最后订单= 5
  2. 4个月内的最后订单= 4
  3. 6个月内的最后订单= 3
  4. 8个月内的最后订单= 2
  5. 10个月内的最后订单= 1
  6. 以下是我一直在努力完成的查询

    WITH rfm AS
    
    (SELECT email,
     SUM((total_incl_tax)) AS cash,
     MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 201605-01, 1)) AS recency,
     COUNT(DISTINCT(order_date)) AS frequency
     FROM order_order
    
     GROUP BY email)
    
    
    SELECT rfm.email,
    CASE
    WHEN rfm.cash >= 2000000 THEN 5
    WHEN rfm.cash > 1500000 THEN 4
    WHEN rfm.cash > 1000000 THEN 3
    WHEN rfm.cash > 500000 THEN 2
    WHEN rfm.frequency > 4 THEN 5
    WHEN rfm.frequency = 4 THEN 4
    WHEN rfm.frequency = 3 THEN 3
    WHEN rfm.frequency = 2 THEN 2
    WHEN rfm.frequency = 1 THEN 1
    else 1
    
    END  + rfm.frequency AS rfm_score
    --+ Five_years.recency
    
    FROM rfm
    GROUP BY rfm.email, rfm.cash,rfm.frequency
    ORDER BY rfm.email
    

    错误是:

    ERROR: function decode(timestamp with time zone, integer, integer, integer, integer, integer, integer, integer, integer, integer, integer) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 186
    

    我假设错误在这一行

    MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 2016-05-01, 1)) AS recency
    

    是否有任何建议将错误行修改为新近维度所述的标准?感谢

1 个答案:

答案 0 :(得分:2)

Postgres中没有decode()功能。您可以使用另一个CASE语句替换它:

WITH rfm AS
(
     SELECT email,
     SUM((total_incl_tax)) AS cash,
     MAX(
         CASE
          WHEN order_order.order_date = '2016-01-01' THEN 5
          WHEN order_order.order_date = '2016-02-01' THEN 4
          WHEN order_order.order_date = '2016-03-01' THEN 3
          WHEN order_order.order_date = '2016-04-01' THEN 2
          WHEN order_order.order_date = '2016-05-01' THEN 1
         END
        )   as recency,
     COUNT(DISTINCT(order_date)) AS frequency
     FROM order_order
     GROUP BY email
 )
SELECT rfm.email,
CASE
    WHEN rfm.cash >= 2000000 THEN 5
    WHEN rfm.cash > 1500000 THEN 4
    WHEN rfm.cash > 1000000 THEN 3
    WHEN rfm.cash > 500000 THEN 2
    WHEN rfm.frequency > 4 THEN 5
    WHEN rfm.frequency = 4 THEN 4
    WHEN rfm.frequency = 3 THEN 3
    WHEN rfm.frequency = 2 THEN 2
    WHEN rfm.frequency = 1 THEN 1
    else 1
END  + rfm.frequency + rfm.recency AS rfm_score
FROM rfm
GROUP BY rfm.email, rfm.cash,rfm.frequency
ORDER BY rfm.email

进一步阅读:Decode equivalent in postgres