在BigQuery中取消嵌套多列

时间:2020-05-13 18:49:02

标签: sql google-bigquery

我正在处理正在使用的UNNEST查询。以下是查询的示例,我当前得到的结果以及我希望从中得到的结果。

一点点上下文,我当前正在执行的上传操作是将ID'sA隔开,而不是,强制将其作为字符串而不是数字作为同一单元内有多个ID。 Price也被,隔开。正在上传的数据示例:

    Name    |    Date    |   Item_ID   |  Price
    John    |  4/17/2020 | 123A456A678 | 19.99,21.99,30.00
    Joe     |  4/17/2020 | 555A777A888 | 8.99,10.00,15.99
    Jake    |  4/18/2020 |   444A333   | 15.99,9.00
    John    |  4/18/2020 |     432     | 75.99
    Megan   |  4/18/2020 | 12A890A23A99| 5.99,6.99,9.99,10.00

这是尝试进行UNNEST之前数据在表中的外观的一个示例。下面是当前UNNEST查询的示例,并带有输出示例。

With data AS(
  SELECT
    Name,
    Date,
    SPLIT(Item_ID, 'A') AS Item_ID_Split,
    SPLIT(Price, ',') AS Price_Split
FROM
  Example.Table
SELECT
  Name,
  Date,
  Item_ID_Split,
  Price_Split
FROM data,
UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
UNNEST(Price_Split) Price_Split WITH OFFSET pos2

当前输出如下所示:

    Name   |    Date   |  Item_ID_Split | Price_Split
    John   | 4/17/2020 |      123       |   19.99
    John   | 4/17/2020 |      456       |   19.99
    John   | 4/17/2020 |      678       |   19.99
    John   | 4/17/2020 |      123       |   21.99
    John   | 4/17/2020 |      456       |   21.99
    John   | 4/17/2020 |      678       |   21.99
    John   | 4/17/2020 |      123       |   30.00
    John   | 4/17/2020 |      456       |   30.00
    John   | 4/17/2020 |      678       |   30.00
    Joe    | 4/17/2020 |      555       |   8.99
    Joe    | 4/17/2020 |      777       |   8.99
    Joe    | 4/17/2020 |      888       |   8.99
    Joe    | 4/17/2020 |      555       |   10.00
    Joe    | 4/17/2020 |      777       |   10.00
    Joe    | 4/17/2020 |      888       |   10.00
    Joe    | 4/17/2020 |      555       |   15.99
    Joe    | 4/17/2020 |      777       |   15.99
    Joe    | 4/17/2020 |      888       |   15.99
    Jake   | 4/18/2020 |      444       |   15.99
    Jake   | 4/18/2020 |      333       |   15.99
    Jake   | 4/18/2020 |      444       |   9.00
    Jake   | 4/18/2020 |      333       |   9.00
    John   | 4/18/2020 |      432       |   75.99
    Megan  | 4/18/2020 |      12        |   5.99
    Megan  | 4/18/2020 |      890       |   5.99
    Megan  | 4/18/2020 |      23        |   5.99
    Megan  | 4/18/2020 |      99        |   5.99
    Megan  | 4/18/2020 |      12        |   6.99
    Megan  | 4/18/2020 |      890       |   6.99
    Megan  | 4/18/2020 |      23        |   6.99
    Megan  | 4/18/2020 |      99        |   6.99
    Megan  | 4/18/2020 |      12        |   9.99
    Megan  | 4/18/2020 |      890       |   9.99
    Megan  | 4/18/2020 |      23        |   9.99
    Megan  | 4/18/2020 |      99        |   9.99
    Megan  | 4/18/2020 |      12        |   10.00
    Megan  | 4/18/2020 |      890       |   10.00
    Megan  | 4/18/2020 |      23        |   10.00
    Megan  | 4/18/2020 |      99        |   10.00

这是上面查询的当前输出。如您所见,有重复的Item_ID /价格,我想要得到的结果如下:

    Name   |    Date   |  Item_ID_Split | Price_Split
    John   | 4/17/2020 |      123       |   19.99
    John   | 4/17/2020 |      456       |   21.99
    John   | 4/17/2020 |      678       |   30.00
    Joe    | 4/17/2020 |      555       |   8.99
    Joe    | 4/17/2020 |      777       |   10.00
    Joe    | 4/17/2020 |      888       |   15.99
    Jake   | 4/18/2020 |      444       |   15.99
    Jake   | 4/18/2020 |      333       |   9.00
    John   | 4/18/2020 |      432       |   75.99
    Megan  | 4/18/2020 |      12        |   5.99
    Megan  | 4/18/2020 |      890       |   6.99
    Megan  | 4/18/2020 |      23        |   9.99
    Megan  | 4/18/2020 |      99        |   10.00

这是我正在寻找Item_ID_SplitPrice_Split之间根本没有重复之处的结果。我试图将SPLIT函数放在UNNEST中,但得到的输出是相同的。我不确定如何完成此操作,因此我们将不胜感激!

提前谢谢!

2 个答案:

答案 0 :(得分:1)

您可以使用with offset

SELECT Name, Date, Item_ID_Split, Price_Split
FROM data LEFT JOIN
     UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
     ON 1=1 LEFT JOIN
     UNNEST(Price_Split) Price_Split WITH OFFSET pos2
     ON pos1 = po2;

答案 1 :(得分:1)

以下是BigQuery标准SQL

#standardSQL
SELECT Name, Day, Splits.*
FROM (
  SELECT Name, Day, 
    ARRAY(
      SELECT AS STRUCT Item_ID_Split, Price_Split
      FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
      JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
      USING(OFFSET)
    ) AS arr
  FROM `project.dataset.table`
), UNNEST(arr) Splits   

如下面的示例所示,如果要从您的问题中申请样本数据

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'John' Name, '4/17/2020' Day, '123A456A678' Item_ID,'19.99,21.99,30.00' Price UNION ALL
  SELECT 'Joe', '4/17/2020', '555A777A888','8.99,10.00,15.99' UNION ALL
  SELECT 'Jake', '4/18/2020', '444A333','15.99,9.00' UNION ALL
  SELECT 'John', '4/18/2020', '432','75.99' UNION ALL
  SELECT 'Megan', '4/18/2020', '12A890A23A99','5.99,6.99,9.99,10.00' 
)
SELECT Name, Day, Splits.*
FROM (
  SELECT Name, Day, 
    ARRAY(
      SELECT AS STRUCT Item_ID_Split, Price_Split
      FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
      JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
      USING(OFFSET)
    ) AS arr
  FROM `project.dataset.table`
), UNNEST(arr) Splits   

输出为

Row Name    Day         Item_ID_Split   Price_Split  
1   John    4/17/2020   123             19.99    
2   John    4/17/2020   456             21.99    
3   John    4/17/2020   678             30.00    
4   Joe     4/17/2020   555             8.99     
5   Joe     4/17/2020   777             10.00    
6   Joe     4/17/2020   888             15.99    
7   Jake    4/18/2020   444             15.99    
8   Jake    4/18/2020   333             9.00     
9   John    4/18/2020   432             75.99    
10  Megan   4/18/2020   12              5.99     
11  Megan   4/18/2020   890             6.99     
12  Megan   4/18/2020   23              9.99     
13  Megan   4/18/2020   99              10.00