稍微难以定义我想要的东西,但在这里采取刺。我正在处理redshift并在以下示例表A 之上编写查询:
User ID || Active_in_Month || Max_Months_On_Platform
1 1 6
1 2 6
1 5 6
2 1 3
2 3 3
按“Active_in_Month”分组后,我想在表B 中获得以下输出:
Active_in_Month || Active_Distinct_Users || User_Cohorts
1 2 2
2 1 2
3 1 2
5 1 1
“Active_Distinct_Users”是一个简单的COUNT(*)。但是,“User_Cohorts”的计算是我被困住的地方。该列应该表示平台上有多少用户最多活动了“active_in_month”组合中的值。例如,在表B 的第1行中,有两个用户具有“Max_Months_on_Platform”> 1(月份活跃)。在表B 的第5行中,只有1个“User_Cohort”,因为只有1个用户拥有“平台上的最大月数”> 5(Active_in_Month)。
希望这能解释我想要达到的目标。
答案 0 :(得分:1)
<强>解决方案强>
使用以下方式解决它,不确定它是否是最好的方法,但它完成了工作:
SELECT
Active_in_Month,
COUNT(DISTINCT user_id),
( SELECT
SUM(number_of_customers)
FROM (SELECT
tbl_a2.Max_Months_On_Platform AS total,
COUNT(DISTINCT tbl_a2.user_id) AS number_of_customers
FROM
tbl_a AS tbl_a2
GROUP BY tbl_a2.Max_Months_On_Platform
)
WHERE total + 1 >= tbl_a.Active_in_Month
) AS total_customers
FROM
tbl_a
答案 1 :(得分:0)
我希望我已经理解了计算User_Cohorts值的正确规则。请试试这个:
SELECT
a.Active_in_Month
, COUNT(*) AS Active_Distinct_Users
, ( SELECT COUNT(DISTINCT user_id) +1
FROM tablea a2
WHERE a.Active_in_Month < a2.Max_Months_On_Platform
AND a.user_id <> a2.user_id
) AS User_Cohorts
FROM tablea a
GROUP BY a.Active_in_Month
ORDER BY a.Active_in_Month;
<强>样品强>
MariaDB [test]> SELECT
-> a.Active_in_Month
-> , COUNT(*) AS Active_Distinct_Users
-> , ( SELECT COUNT(DISTINCT user_id) +1
-> FROM tablea a2
-> WHERE a.Active_in_Month < a2.Max_Months_On_Platform
-> AND a.user_id <> a2.user_id
-> ) AS User_Cohorts
-> FROM tablea a
-> GROUP BY a.Active_in_Month
-> ORDER BY a.Active_in_Month;
+-----------------+-----------------------+--------------+
| Active_in_Month | Active_Distinct_Users | User_Cohorts |
+-----------------+-----------------------+--------------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
| 3 | 1 | 2 |
| 5 | 1 | 1 |
+-----------------+-----------------------+--------------+
4 rows in set (0.00 sec)
MariaDB [test]>