Google Big Query - 连接年份范围

时间:2017-02-01 17:33:43

标签: mysql sql google-bigquery

我试图通过对年份数据进行分组来整合与车辆装修相关的大型数据集。例如,我们数据库中的特定SKU可能适合2012款现代伊兰特GLS。同样的SKU也可以适用于同一车辆,但是在2013年,2014年和2015年。对于非常小的数据集,以下查询实现了我正在寻找的......:

SELECT
sku,
CASE
  WHEN MIN(YEAR) = MAX(YEAR) THEN MIN(YEAR)
  ELSE CONCAT(MIN(YEAR), '-', MAX(YEAR))
 END AS YEAR,
 make, model, submodel, notes
FROM
(SELECT @ldfnr:= IF((@old_make = tab.make
  AND @old_model = tab.model
  AND @old_submodel = tab.submodel
  AND @old_notes = tab.notes
  AND (@old_year = tab.`year`
  OR @old_year = tab.`year`-1)) , @ldfnr, @ldfnr+1) AS nr, tab.* ,
  @old_make := tab.make , @old_model := tab.model ,
  @old_submodel := tab.submodel , @old_notes := tab.notes ,
  @old_year := tab.`year`
FROM tableName AS tab,
  (SELECT @ldfnr:=0, @old_model:='', @old_submodel:='', @old_notes:='', @old_year:='', @old_make:=''  ) AS tmp
ORDER BY make, model, submodel, notes, `YEAR` ASC) AS mytab
GROUP BY nr
ORDER BY nr;

但是,我们的数据集非常大。出于这个原因,我试图将数据加载到Google BigQuery中,并在那里执行相同的查询。也许这是Google BigQuery的限制,但它会一直返回与第9行第2列相关的错误。这是可以找到辅助SELECT查询的地方。

我在SQLFiddle上有一些示例数据供参考。

我正在考虑使用AWS来完成此任务,但我想我先试试这里。我很感激你的时间。 : - )

编辑下方......:

以下是数据现在的样子......:

+------+------+-----------+-------+----------+------------------------------------------+
| SKU  | Year |   Make    | Model | Submodel |                  Notes                   |
+------+------+-----------+-------+----------+------------------------------------------+
| 0001 | 1995 | Chevrolet | Astro | Base     | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1995 | Chevrolet | Astro | CL       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1995 | Chevrolet | Astro | LS       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1996 | Chevrolet | Astro | Base     | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1996 | Chevrolet | Astro | CL       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1996 | Chevrolet | Astro | LS       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1997 | Chevrolet | Astro | Base     | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1997 | Chevrolet | Astro | LT       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 2001 | Chevrolet | Astro | Base     | Clear Lens; Chrome Housing; Pair; 1 pc.; |
+------+------+-----------+-------+----------+------------------------------------------+

以下是预期结果:

+------+-------------+-----------+-------+----------+------------------------------------------+
| SKU  |    Year     |   Make    | Model | Submodel |                  Notes                   |
+------+-------------+-----------+-------+----------+------------------------------------------+
| 0001 | 1995 - 1997 | Chevrolet | Astro | Base     | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1995 - 1996 | Chevrolet | Astro | CL       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1995 - 1996 | Chevrolet | Astro | LS       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 1997        | Chevrolet | Astro | LT       | Clear Lens; Chrome Housing; Pair; 1 pc.; |
| 0001 | 2001        | Chevrolet | Astro | Base     | Clear Lens; Chrome Housing; Pair; 1 pc.; |
+------+-------------+-----------+-------+----------+------------------------------------------+

我很抱歉不包括之前的内容! : - )

2 个答案:

答案 0 :(得分:3)

如果您只想连接多年的范围,那么使用窗口函数的方式更简单(也更便于移植):

select sku, make, model, submodel, notes,
       (case when min(year) = max(year) then min(year)
             else min(year) || '-' || max(year)
        end) as year
from (select qt.*,
             sum(case when qtprev.make is null then 1 else 0 end) over (partition by qt.make, qt.model, qt.notes, qt.submodel, qt.sku order b qt.year) as grp
      from `tint-world-aces-processing.aces_table.queryTest` qt left join
           `tint-world-aces-processing.aces_table.queryTest` qtprev
           on qt.make = qtprev.make and qt.model = qtprev.model and
              qt.notes = qtprev.notes and qt.submodel = qtprev.submodel and
              qt.sku = qtprev.sku and qt.year = qtprev.year + 1
     ) qt
group by sku, make, model, submodel, notes;

(注意对StandardSQL的细微更改。)

答案 1 :(得分:1)

下面是BigQuery Standard SQL,没有JOINs

#standardSQL
WITH yourTable AS (
  SELECT 
    '0001' AS SKU, 1995 AS Year, 'Chevrolet' AS Make, 'Astro' AS Model, 'Base' AS Submodel, 
    'Clear Lens; Chrome Housing; Pair; 1 pc.;' AS Notes UNION ALL
  SELECT '0001', 1995, 'Chevrolet', 'Astro', 'CL', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 1995, 'Chevrolet', 'Astro', 'LS', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 1996, 'Chevrolet', 'Astro', 'Base', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 1996, 'Chevrolet', 'Astro', 'CL', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 1996, 'Chevrolet', 'Astro', 'LS', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 1997, 'Chevrolet', 'Astro', 'Base', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 1997, 'Chevrolet', 'Astro', 'LT', 'Clear Lens; Chrome Housing; Pair; 1 pc.;' UNION ALL
  SELECT '0001', 2001, 'Chevrolet', 'Astro', 'Base', 'Clear Lens; Chrome Housing; Pair; 1 pc.;'
)
SELECT SKU,
  IF(MIN(Year) = MAX(Year), 
    CAST(MIN(Year) AS STRING), 
    CONCAT(CAST(MIN(Year) AS STRING), ' - ', CAST(MAX(Year) AS STRING))
  ) AS Year, 
  Make, Model, Submodel, Notes
FROM (
  SELECT SKU, Year, Make, Model, Submodel, Notes, 
    SUM(Step) OVER(PARTITION BY SKU, Make, Model, Submodel, Notes ORDER BY Year) AS grp
  FROM (
    SELECT SKU, Year, Make, Model, Submodel, Notes, 
      IFNULL(SIGN(Year - 1 - LAG(Year) OVER(PARTITION BY SKU, Make, Model, Submodel, Notes ORDER BY Year)), 1) AS Step
    FROM yourTable  
  )
)
GROUP BY SKU, Make, Model, Submodel, Notes, grp