查询具有多个重复记录的Big Query表

时间:2016-10-19 15:21:06

标签: nested google-bigquery

我有一个包含三个重复记录类型的表。

该表的示例如下:

STR string,
SKU integer,
DAILY_SALES record repeated,
DAILY_SALES.SLS_DT DATE,
DAILY_SALES.SLS_AMT FLOAT,
PROD_HIER record repeated,
PROD_HIER.PROD_DESC STRING,
PROD_HIER.DEPT   integer,
PROD_HIER.EFF_BGN_DT DATE,
STR_HIER record repeated,
STR_HIER.STR_NM string,
STR_HIER.DIV    string,
STR_HIER.EFF_BGN_DT DATE

对于每个STR / SKU记录,我需要从具有最大(最新)EFF_BGN_DT的PROD_HIER获取数据,并从STR_HIER获取具有最新STR_HIER.EFF_BGN_DT的记录。

如果可以在遗留sql(用于外部工具)和标准SQL中完成此操作,将会有所帮助。非常感谢任何想法。

1 个答案:

答案 0 :(得分:1)

  

对于BigQuery Standard SQL(请参阅Enabling Standard SQL

SELECT 
  STR, 
  SKU, 
  (SELECT STRUCT(PROD_DESC, DEPT, EFF_BGN_DT) 
      FROM UNNEST(PROD_HIER) 
      ORDER BY EFF_BGN_DT DESC LIMIT 1
  ) AS PROD_HIER,
  (SELECT STRUCT(STR_NM, EFF_BGN_DT) 
      FROM UNNEST(STR_HIER) 
      ORDER BY EFF_BGN_DT DESC LIMIT 1
  ) AS STR_HIER
FROM YourTable
  

对于BigQuery Legacy SQL

这个假设你的重复字段每个至少有一个条目。如果不是这种情况 - 您应该略微修改JOIN(请参阅有关JOIN operator and JOIN types

的更多信息)
SELECT
  PROD_HIER.STR AS STR, 
  PROD_HIER.SKU AS SKU, 
  PROD_HIER.PROD_DESC,
  PROD_HIER.DEPT,
  PROD_HIER.EFF_BGN_DT,
  STR_HIER.STR_NM,
  STR_HIER.EFF_BGN_DT
FROM (
  SELECT 
    STR, 
    SKU, 
    PROD_HIER.PROD_DESC AS PROD_DESC,
    PROD_HIER.DEPT AS DEPT,
    PROD_HIER.EFF_BGN_DT AS EFF_BGN_DT,
    ROW_NUMBER() OVER(PARTITION BY STR, SKU ORDER BY EFF_BGN_DT DESC) AS win
  FROM YourTable
) AS PROD_HIER
JOIN (
  SELECT 
    STR, 
    SKU, 
    STR_HIER.STR_NM AS STR_NM, 
    STR_HIER.EFF_BGN_DT AS EFF_BGN_DT,
    ROW_NUMBER() OVER(PARTITION BY STR, SKU ORDER BY EFF_BGN_DT DESC) AS win
  FROM YourTable
) AS STR_HIER
ON PROD_HIER.STR = STR_HIER.STR
AND PROD_HIER.SKU = STR_HIER.SKU
AND PROD_HIER.win = STR_HIER.win
WHERE PROD_HIER.win = 1