我有一个包含三个重复记录类型的表。
该表的示例如下:
STR string,
SKU integer,
DAILY_SALES record repeated,
DAILY_SALES.SLS_DT DATE,
DAILY_SALES.SLS_AMT FLOAT,
PROD_HIER record repeated,
PROD_HIER.PROD_DESC STRING,
PROD_HIER.DEPT integer,
PROD_HIER.EFF_BGN_DT DATE,
STR_HIER record repeated,
STR_HIER.STR_NM string,
STR_HIER.DIV string,
STR_HIER.EFF_BGN_DT DATE
对于每个STR / SKU记录,我需要从具有最大(最新)EFF_BGN_DT的PROD_HIER获取数据,并从STR_HIER获取具有最新STR_HIER.EFF_BGN_DT的记录。
如果可以在遗留sql(用于外部工具)和标准SQL中完成此操作,将会有所帮助。非常感谢任何想法。
答案 0 :(得分:1)
对于BigQuery Standard SQL(请参阅Enabling Standard SQL)
SELECT
STR,
SKU,
(SELECT STRUCT(PROD_DESC, DEPT, EFF_BGN_DT)
FROM UNNEST(PROD_HIER)
ORDER BY EFF_BGN_DT DESC LIMIT 1
) AS PROD_HIER,
(SELECT STRUCT(STR_NM, EFF_BGN_DT)
FROM UNNEST(STR_HIER)
ORDER BY EFF_BGN_DT DESC LIMIT 1
) AS STR_HIER
FROM YourTable
对于BigQuery Legacy SQL
这个假设你的重复字段每个至少有一个条目。如果不是这种情况 - 您应该略微修改JOIN(请参阅有关JOIN operator and JOIN types
的更多信息)SELECT
PROD_HIER.STR AS STR,
PROD_HIER.SKU AS SKU,
PROD_HIER.PROD_DESC,
PROD_HIER.DEPT,
PROD_HIER.EFF_BGN_DT,
STR_HIER.STR_NM,
STR_HIER.EFF_BGN_DT
FROM (
SELECT
STR,
SKU,
PROD_HIER.PROD_DESC AS PROD_DESC,
PROD_HIER.DEPT AS DEPT,
PROD_HIER.EFF_BGN_DT AS EFF_BGN_DT,
ROW_NUMBER() OVER(PARTITION BY STR, SKU ORDER BY EFF_BGN_DT DESC) AS win
FROM YourTable
) AS PROD_HIER
JOIN (
SELECT
STR,
SKU,
STR_HIER.STR_NM AS STR_NM,
STR_HIER.EFF_BGN_DT AS EFF_BGN_DT,
ROW_NUMBER() OVER(PARTITION BY STR, SKU ORDER BY EFF_BGN_DT DESC) AS win
FROM YourTable
) AS STR_HIER
ON PROD_HIER.STR = STR_HIER.STR
AND PROD_HIER.SKU = STR_HIER.SKU
AND PROD_HIER.win = STR_HIER.win
WHERE PROD_HIER.win = 1