我正在使用表 TRANSACTIONS 进行群组分析。下面是表架构,
USER_ID NUMBER,
PAYMENT_DATE_UTC DATE,
IS_PAYMENT_ADDED BOOLEAN
下面是一个快速查询,用于查看 USER_ID 12345(一个示例)如何根据提供的日期过滤器查看不同的同类群组,
WITH RESULT(
SELECT
USER_ID,
TO_DATE(PAYMENT_DATE_UTC) AS PAYMENT_DATE,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY 1,2
HAVING PAYMENT_ADDED_COUNT>=1
ORDER BY 2
)
SELECT
COUNT(DISTINCT r.USER_ID),
SUM(r.PAYMENT_ADDED_COUNT)
FROM RESULT r
WHERE r.USER_ID=12345
AND (r.PAYMENT_DATE>='2021-02-01' AND r.PAYMENT_DATE<'2021-02-15')
此查询的时间范围(两周)的结果是
| 1 | 55 |
并且此 USER_ID 将被归类为提供的日期过滤器的常规用户群组(付款次数超过 10 次的群组)
如果在时间范围内运行相同的查询,例如说 '2021-02-07'
,结果将是
| 1 | 10 |
对于提供的日期过滤器,此 USER_ID 将被归类为临时用户群组(支付 1 到 10 次的用户)
我有以下查询,可根据添加的付款总额将 USER_ID 放入两个不同的群组中,
WITH
ALL_USER_COHORT AS
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
),
OCASSIONAL_USER_COHORT AS
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING (PAYMENT_ADDED_COUNT>=1 AND PAYMENT_ADDED_COUNT<=10)
),
REGULAR_USER_COHORT AS
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING PAYMENT_ADDED_COUNT>10
)
SELECT
COUNT(DISTINCT ou.USER_ID) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.USER_ID) AS "REGULAR USERS"
FROM ALL_USER_COHORT au
LEFT JOIN OCASSIONAL_USER_COHORT ou ON au.USER_ID=ou.USER_ID
LEFT JOIN REGULAR_USER_COHORT ru ON au.USER_ID=ru.USER_ID
LEFT JOIN TRANSACTIONS t ON au.USER_ID=t.USER_ID
WHERE au.USER_ID=12345
AND TO_DATE(t.PAYMENT_DATE_UTC)>='2021-02-07'
理想情况下,USER_ID 12345 应根据提供的日期过滤器分类为“OCCASIONAL USERS”,但查询将其分类为“REGULAR USERS”。
答案 0 :(得分:1)
对于初学者来说,CTE 可以像这样删除冗余:
WITH all_user_cohort AS (
SELECT
USER_ID,
SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
FROM transactions
GROUP BY user_id
), ocassional_user_cohort AS (
SELECT * FROM all_user_cohort
WHERE PAYMENT_ADDED_COUNT between 1 AND 10
), regular_user_cohort AS (
SELECT * FROM all_user_cohort
WHERE PAYMENT_ADDED_COUNT > 10
)
SELECT
COUNT(DISTINCT ou.user_id) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.user_id) AS "REGULAR USERS"
FROM all_user_cohort AS au
LEFT JOIN ocassional_user_cohort ou ON au.user_id=ou.user_id
LEFT JOIN regular_user_cohort ru ON au.user_id=ru.user_id
LEFT JOIN transactions t ON au.user_id=t.user_id
WHERE au.user_id=12345
AND TO_DATE(t.payment_date_utc)>='2021-03-01'
但您遇到此问题的原因是您一直在做该做的事情。
您想要的是将日期过滤器移到 all_user_cohort
中,而不是在您可以对满足需要的行数求和时制作表格。
WITH all_user_cohort AS (
SELECT
USER_ID,
SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
FROM transactions
WHERE TO_DATE(payment_date_utc)>='2021-03-01'
GROUP BY user_id
)
SELECT
SUM(IFF(payment_added_count between 1 AND 10, 1,0)) AS "OCCASIONAL USERS"
SUM(IFF(payment_added_count > 10, 1,0)) AS "REGULAR USERS"
FROM transactions
WHERE au.user_id=12345
也可以采用不同的方式,如果这更符合您的要求,或者出于其他原因。
WITH all_user_cohort AS (
SELECT
USER_ID,
SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
FROM transactions
WHERE TO_DATE(payment_date_utc)>='2021-03-01'
GROUP BY user_id
), classify_users AS (
SELECT user_id
,CASE
WHEN payment_added_count between 1 AND 10 THEN 'OCCASIONAL USERS'
WHEN payment_added_count > 10 THEN 'REGULAR USERS'
ELSE 'users with zero payments'
END AS classified
FROM all_user_cohort
)
SELECT classified
,count(*)
FROM classify_users
WHERE user_id=12345
GROUP BY 1