BigQuery使用WHERE子句将rank / percent_rank应用于列

时间:2019-11-11 00:50:05

标签: google-bigquery

我有一个相当宽的bigquery表,其中包含约20-30个不同的列,每个列都需要接收一个补充的 percentile 列,该列显示了该列与表中所有其他行相比的百分位值。但是,如果 另一列中的值满足某个阈值,则每个列仅应接收一个百分位数。为了展示这一点,我在下面创建了一个可复制的示例:

WITH
  correct_games_played AS
    (
      SELECT "a" as name, 7 as num1, 0.4 as num2, 0.55 as num3
      UNION ALL SELECT "b" as name, 13 as num1, 0.53 as num2, 0.37 as num3
      UNION ALL SELECT "c" as name, 4 as num1, 0.42 as num2, 0.32 as num3
      UNION ALL SELECT "d" as name, 17 as num1, 0.6 as num2, 0.23 as num3
      UNION ALL SELECT "e" as name, 7 as num1, 0.3 as num2, 0.25 as num3
      UNION ALL SELECT "f" as name, 16 as num1, 0.7 as num2, 0.43 as num3
      UNION ALL SELECT "g" as name, 10 as num1, 0.53 as num2, 0.52 as num3
      UNION ALL SELECT "h" as name, 5 as num1, 0.54 as num2, 0.21 as num3
      UNION ALL SELECT "i" as name, 9 as num1, 0.56 as num2, 0.17 as num3
      UNION ALL SELECT "j" as name, 3 as num1, 0.75 as num2, 0.53 as num3
    )

  SELECT 
    a.*,
    -- RANK() OVER(ORDER BY a.num1 DESC) AS num1_rank,
    -- RANK() OVER(ORDER BY a.num2 DESC) AS num2_rank,
    -- RANK() OVER(ORDER BY a.num3 DESC) AS num3_rank
    RANK() OVER(ORDER BY a.num1 DESC) AS num1_rank,
    RANK() OVER(ORDER BY a.num2 WHERE a.num1 > 4 DESC) AS num2_rank
    RANK() OVER(ORDER BY a.num3 WHERE a.num1 > 3 DESC) AS num3_rank
  FROM correct_games_played as a

此脚本引发错误Syntax error: Expected ")" but got keyword WHERE at [22:37],但是如果我用注释为rank()的内容替换rank(),此脚本将起作用。我的目标真的很简单:

  • num2_rank:如果a.num2大于4,则仅对a.num1中的值进行排名,否则显示null
  • num3_rank:如果a.num3大于3,则仅对a.num1中的值进行排名,否则显示null

我的表很宽,并且每列都有可能需要自己的条件来确定是否应对每列的行值进行排名。任何帮助,将不胜感激,谢谢!

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
WITH correct_games_played AS (
  SELECT "a" AS name, 7 AS num1, 0.4 AS num2, 0.55 AS num3 UNION ALL 
  SELECT "b" AS name, 13 AS num1, 0.53 AS num2, 0.37 AS num3 UNION ALL 
  SELECT "c" AS name, 4 AS num1, 0.42 AS num2, 0.32 AS num3 UNION ALL 
  SELECT "d" AS name, 17 AS num1, 0.6 AS num2, 0.23 AS num3 UNION ALL 
  SELECT "e" AS name, 7 AS num1, 0.3 AS num2, 0.25 AS num3 UNION ALL 
  SELECT "f" AS name, 16 AS num1, 0.7 AS num2, 0.43 AS num3 UNION ALL 
  SELECT "g" AS name, 10 AS num1, 0.53 AS num2, 0.52 AS num3 UNION ALL 
  SELECT "h" AS name, 5 AS num1, 0.54 AS num2, 0.21 AS num3 UNION ALL 
  SELECT "i" AS name, 9 AS num1, 0.56 AS num2, 0.17 AS num3 UNION ALL 
  SELECT "j" AS name, 3 AS num1, 0.75 AS num2, 0.53 AS num3
)
SELECT *,
  RANK() OVER(ORDER BY num1 DESC) AS num1_rank,
  IF(num1 > 4, RANK() OVER(ORDER BY IF(num1 > 4, num2, NULL) DESC), NULL)  AS num2_rank,
  IF(num1 > 3, RANK() OVER(ORDER BY IF(num1 > 3, num3, NULL) DESC), NULL) AS num3_rank
FROM correct_games_played