RFM分析未输出所有客户ID

时间:2019-12-06 18:52:16

标签: sql amazon-redshift aginity

因此,我正在进行RFM分析,并且在很多帮助下,能够将以下查询组合在一起,输出了customer_id,r得分,f得分,m得分以及最后的rfm得分组合: >

--This will first create quintiles using the ntile function
--Then factor in the conditions
--Then combine the score
--Then the substrings will seperate each score's individual points

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2019-01-01' AND CURRENT_DATE
    AND customer_id = 'US621111112234061'

GROUP BY customer_id))

ORDER BY customer_id desc)

在上面,您将注意到我强迫它仅在特定的customer_id上输出。这是因为我想测试该查询是否考虑了当customer_id出现在多个YearMonth类别中的时间(因为他们本来可以在1月购买,然后在2月再次购买,然后在11月再次购买)。

这里的问题是,尽管查询输出正确的分数,但它似乎只解释一次customer_id,而不管它是否出现在多个月中。对于这个特定的客户ID,我看到它们出现在2019年1月,2019年2月和2019年11月,因此应该给我3行而不是仅仅1行。正在测试几个小时并且似乎找不到原因,但是我怀疑我的分组可能是错误的。

谢谢您的帮助,如果您有任何疑问,请告诉我!!

最好

Z

0 个答案:

没有答案