聚合列上的奇异条件

时间:2017-01-28 21:46:39

标签: mysql amazon-redshift

稍微难以定义我想要的东西,但在这里采取刺。我正在处理redshift并在以下示例表A 之上编写查询:

User ID ||  Active_in_Month  || Max_Months_On_Platform
1           1                   6
1           2                   6
1           5                   6
2           1                   3
2           3                   3

按“Active_in_Month”分组后,我想在表B 中获得以下输出:

Active_in_Month  ||   Active_Distinct_Users   ||   User_Cohorts
1                     2                            2
2                     1                            2
3                     1                            2
5                     1                            1

“Active_Distinct_Users”是一个简单的COUNT(*)。但是,“User_Cohorts”的计算是我被困住的地方。该列应该表示平台上有多少用户最多活动了“active_in_month”组合中的值。例如,在表B 的第1行中,有两个用户具有“Max_Months_on_Platform”> 1(月份活跃)。在表B 的第5行中,只有1个“User_Cohort”,因为只有1个用户拥有“平台上的最大月数”> 5(Active_in_Month)。

希望这能解释我想要达到的目标。

2 个答案:

答案 0 :(得分:1)

<强>解决方案

使用以下方式解决它,不确定它是否是最好的方法,但它完成了工作:

SELECT
    Active_in_Month,
    COUNT(DISTINCT user_id),
    ( SELECT 
SUM(number_of_customers)
          FROM (SELECT 
                  tbl_a2.Max_Months_On_Platform AS total,
                  COUNT(DISTINCT tbl_a2.user_id) AS number_of_customers
                FROM 
                  tbl_a AS tbl_a2
                GROUP BY tbl_a2.Max_Months_On_Platform
                )
            WHERE total + 1 >= tbl_a.Active_in_Month  
        ) AS total_customers

      FROM
        tbl_a

答案 1 :(得分:0)

我希望我已经理解了计算User_Cohorts值的正确规则。请试试这个:

SELECT
    a.Active_in_Month
    , COUNT(*) AS Active_Distinct_Users
    , ( SELECT COUNT(DISTINCT user_id) +1
        FROM tablea a2
        WHERE a.Active_in_Month < a2.Max_Months_On_Platform
        AND a.user_id <> a2.user_id
    ) AS User_Cohorts
FROM tablea a
GROUP BY a.Active_in_Month
ORDER BY a.Active_in_Month;

<强>样品

MariaDB [test]> SELECT
    ->     a.Active_in_Month
    ->     , COUNT(*) AS Active_Distinct_Users
    ->     , ( SELECT COUNT(DISTINCT user_id) +1
    ->         FROM tablea a2
    ->         WHERE a.Active_in_Month < a2.Max_Months_On_Platform
    ->         AND a.user_id <> a2.user_id
    ->     ) AS User_Cohorts
    -> FROM tablea a
    -> GROUP BY a.Active_in_Month
    -> ORDER BY a.Active_in_Month;
+-----------------+-----------------------+--------------+
| Active_in_Month | Active_Distinct_Users | User_Cohorts |
+-----------------+-----------------------+--------------+
|               1 |                     2 |            2 |
|               2 |                     1 |            2 |
|               3 |                     1 |            2 |
|               5 |                     1 |            1 |
+-----------------+-----------------------+--------------+
4 rows in set (0.00 sec)

MariaDB [test]>