BigQuery表的平面数据,并将分层数据复制到新的BigQuery表

时间:2017-08-09 11:37:16

标签: google-bigquery flatten unnest

我是BQ的新手。 我有一个表,其中一些列是重复记录的。我正在尝试使表平坦,因此它将是一种关系,并将分层数据插入到新的BigQuery表中。 可能吗?我该怎么办?

1 个答案:

答案 0 :(得分:0)

下面是BigQuery Standard SQL

  

假设你有简单的表格,如下所示

Row id  repeated_record  
--- --  ---------------
1   1   google   
        facebook     
        viant    
2   2   dell     
        hp     

您可以使用以下查询轻松模仿它

#standardSQL
WITH `table-with-repeated-record` AS (
  SELECT 1 AS id, ['google', 'facebook', 'viant'] AS repeated_record UNION ALL
  SELECT 2, ['dell', 'hp']
)
SELECT *
FROM `table-with-repeated-record`  

现在,为了让它变得扁平 - 请使用以下查询

#standardSQL
WITH `table-with-repeated-record` AS (
  SELECT 1 AS id, ['google', 'facebook', 'viant'] AS repeated_record UNION ALL
  SELECT 2, ['dell', 'hp']
)
SELECT id, flatted_data
FROM `table-with-repeated-record`, 
  UNNEST(repeated_record) AS flatted_data   

结果如下

Row id  flatted_data     
--- --  ------------
1   1   google   
2   1   facebook     
3   1   viant    
4   2   dell     
5   2   hp    

以下是另一个例子

#standardSQL
WITH `table-with-repeated-record` AS (
  SELECT 1 AS id, [STRUCT<line INT64, name STRING>(1, 'google'), (2, 'facebook'), (3, 'viant')] AS repeated_record UNION ALL
  SELECT 2, [STRUCT<line INT64, name STRING>(5, 'dell'), (6, 'hp')]
)
SELECT *
FROM `table-with-repeated-record`  

模仿下表

Row id  repeated_record.line    repeated_record.name     
--- --  --------------------    --------------------
1   1   1                       google   
        2                       facebook     
        3                       viant    
2   2   5                       dell     
        6                       hp

以及压扁它的方法是:

#standardSQL
WITH `table-with-repeated-record` AS (
  SELECT 1 AS id, [STRUCT<line INT64, name STRING>(1, 'google'), (2, 'facebook'), (3, 'viant')] AS repeated_record UNION ALL
  SELECT 2, [STRUCT<line INT64, name STRING>(5, 'dell'), (6, 'hp')]
)
SELECT id, flatted_data.line, flatted_data.name
FROM `table-with-repeated-record`, 
  UNNEST(repeated_record) AS flatted_data   

最终以

结束
Row id  line    name     
--- --  ----    ----
1   1   1       google   
2   1   2       facebook     
3   1   3       viant    
4   2   5       dell     
5   2   6       hp    
  

您是否知道如何在不指定[&#39; google&#39;,&#39; facebook&#39;,&#39; viant&#39;]等数据的情况下执行此操作?表的大小不是恒定的,并且它不时地变化,以及存储在表中的数据,我唯一知道的就是列

你应该只使用下面的内容(没有用作示例的虚拟数据,并且你可以使用查询)

#standardSQL
SELECT id, flatted_data.line, flatted_data.name
FROM `yourProject.yourDataset.yourTable`, 
  UNNEST(repeated_record) AS flatted_data