根据SQL中组内的百分比分配类别

时间:2017-03-15 18:41:51

标签: sql vertica

假设我有一个这样的表:

CampaignId    Category    Strike
    1            A          2
    1            B          3
    1          Others       5
    2            A          4
    2            B          2
    3            C          1
    3            C          4
    4            A          1
    4            B          1
    4            C          1
    4            D          1
    4          Others       1

然后,我会按Strike计算每个Category CampaignId的百分比,如下所示:

SELECT CampaignId, Category, Strike, (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable

产生下面的中间表:

CampaignId    Category    Strike    PercentageOfStrikesByCategoryByCampaignId
    1            A          2        20.0
    1            B          3        30.0
    1          Others       5        50.0
    2            A          4        66.6
    2            B          2        33.3
    3            C          1        20.0
    3            C          4        80.0
    4            A          1        20.0
    4            B          1        20.0
    4            C          1        20.0
    4            D          1        20.0
    4         Others        1        20.0

现在,我想根据上面计算的FinalCategory分配最终标签,说PercentageOfStrikesByCategoryByCampaignIdFinalCategory条件的要点是:如果每个CampaignId中的一个类别是'其他' ANDPercentageOfStrikesByCategoryByCampaignId >= 30.0,然后CampaignId组中的其余行将标记为'其他'。否则,我们会将Category直接复制到FinalCategory。结果表应如下所示:

CampaignId    Category    Strike    PercentageOfStrikesByCategoryByCampaignId    FinalCategory
    1            A          2        20.0                                        Others 
    1            B          3        30.0                                        Others
    1          Others       5        50.0                                        Others
    2            A          4        66.6                                        A
    2            B          2        33.3                                        B
    3            C          1        20.0                                        C
    3            C          4        80.0                                        C
    4            A          1        20.0                                        A
    4            B          1        20.0                                        B
    4            C          1        20.0                                        C
    4            D          1        20.0                                        D
    4         Others        1        20.0                                        Others

我怎样才能使用尽可能简单的SQL查询来实现这样的功能?提前感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

SELECT CampaignId, Category, Strike, PercentageOfStrikesByCategoryByCampaignId,
CASE WHEN Others_count > 0 AND 
     MAX(CASE WHEN Category='Others' THEN PercentageOfStrikesByCategoryByCampaignId END) OVER (PARTITION BY CampaignId) >= 30 THEN 'Others'
ELSE Category END AS FinalCategory
FROM (
SELECT CampaignId, Category, Strike, 
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) 
 / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
,SUM(CASE WHEN Category='Others' THEN 1 ELSE 0 END) OVER (PARTITION BY CampaignId) as Others_count
FROM myTable
) T

添加到现有查询

  • 具有sum窗口功能的每个campaignId的Others_Count
  • 使用带有计算的Others_Count和case窗口函数的max表达式检查Others类别的行是否具有百分比> = 30并指定'其他'作为最终类别,否则使用按原样分类。

答案 1 :(得分:1)

让我们以查询作为CTE或子查询开始:

WITH t as (
      SELECT CampaignId, Category, Strike, 
             (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
      FROM myTable
     )
select t.*,
       (case when OthersFlag = 1 then 'Others' else category end) as FinalCategory
from (select t.*,
             sum(case when category = 'Others' and PercentageOfStrikesByCategoryByCampaignId > 30.0 then 1 else 0 end) over
                 (partition by campaignid) as OthersFlag
      from t
     ) t;