将旧版SQL Flatten函数转换为标准SQL

时间:2019-03-05 23:50:50

标签: google-bigquery

我用#LegacySQL写了以下内容:

   SELECT
      customer_email,
      submitted_at,
      title,
      answers.choices.labels answer_choices,
      answers.number score,
      answers.boolean true_false,
      metadata.platform device_type
    FROM
      (FLATTEN([test-test:sample.responses], answers)) resp
      LEFT JOIN [test-test:sample.forms] forms
    ON resp.answers.field.id = forms.id
    ORDER BY 1 ASC

它按我的要求返回结果,就像这样:

+------------------+-------------------------+------------------------------------------------------------+--------------------+-------+------------+-------------+
|  customer_email  |      submitted_at       |                           title                            |   answer_choices   | score | true_false | device_type |
+------------------+-------------------------+------------------------------------------------------------+--------------------+-------+------------+-------------+
| myname@gmail.com | 2018-12-25 04:00:02 UTC | How would you rate this product?                           |                    |    10 |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What did you enjoy the most about your experience with us? | Delivery           |       |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What other product(s) would you like to see us make?       | Additional Colors  |       |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What other product(s) would you like to see us make?       | Additional Designs |       |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What color(s) would you want to see?                       | Green              |       |            | other       |
+------------------+-------------------------+------------------------------------------------------------+--------------------+-------+------------+-------------+

我正在尝试转换为BigQuery StandarSQL,并且将以下内容汇总在一起:

SELECT
  customer_email,
  submitted_at,
  title,
  answers.choices.labels answer_choices,
  answers.number score,
  answers.boolean true_false,
  metadata.platform device_type
FROM
`sample.responses` resp, unnest(answers) answers
LEFT JOIN `sample.forms` forms
ON answers.field.id  = forms.id
ORDER BY 1 ASC

不幸的是,它以如下记录形式返回它:

+------------------+-------------------------+-------------------------+------------------------------------------------------------+--------------------+-------+------------+-------------+
|  customer_email  |      submitted_at       |        landed_at        |                           title                            |   answer_choices   | score | true_false | device_type |
+------------------+-------------------------+-------------------------+------------------------------------------------------------+--------------------+-------+------------+-------------+
| myname@gmail.com | 2018-12-25 04:00:02 UTC | 2018-12-25 03:59:07 UTC | What did you enjoy the most about your experience with us? | Delivery           | null  | null       | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | 2018-12-25 03:59:07 UTC | What other product(s) would you like to see us make?       | Additional Colors  | null  | null       | other       |
|                  |                         |                         |                                                            | Additional Designs |       |            |             |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | 2018-12-25 03:59:07 UTC | What color(s) would you want to see?                       | Green              | null  | null       | other       |
+------------------+-------------------------+-------------------------+------------------------------------------------------------+--------------------+-------+------------+-------------+

我做错了什么?

2 个答案:

答案 0 :(得分:0)

要记住的一件事是,传统SQL在展平数组时对数组执行LEFT JOIN,而标准SQL的逗号运算符对数组执行常规JOIN。如果在取消嵌套数组时使用LEFT JOIN,则在数组为空时将在输出中收到一行(元素值为NULL),而如果使用常规的{{1} },该行将被省略。这类似于表之间的联接如何工作。有关语义的更多详细信息,请参见migration guide

我想查询的是在表和JOIN数组之间使用LEFT JOIN

answers

如果您也想展平SELECT customer_email, submitted_at, title, answer.choices.labels AS answer_choices, answer.number score, answer.boolean true_false, metadata.platform device_type FROM `sample.responses` resp LEFT JOIN UNNEST(resp.answers) AS answer LEFT JOIN `sample.forms` forms ON answer.field.id = forms.id ORDER BY 1 ASC ,则可以将其更改为:

answer.choices.labels

答案 1 :(得分:0)

假设这是构造初始数据的查询

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'How would you rate this product?' as title,
  STRUCT (
    [
      STRUCT('' as label)
    ] AS choices,
    10 as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

UNION ALL

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'What did you enjoy the most about your experience with us?' as title,
  STRUCT (
    [
      STRUCT('' as label)
    ] AS choices,
    NULL as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

UNION ALL

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'What other product(s) would you like to see us make?' as title,
  STRUCT (
    [
      STRUCT('Additional Designs' as label),
      STRUCT('Additional Colors' as label)
    ] AS choices,
    NULL as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

UNION ALL

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'What color(s) would you want to see?' as title,
  STRUCT (
    [
      STRUCT('Green' as label)
    ] AS choices,
    NULL as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

那么您的查询将是

WITH joined_table AS (
    SELECT
        customer_email,
        submitted_at,
        landed_at,
        title,
        answers.choices answers_choices,
        answers.number score,
        answers.boolean true_false,
        metadata.platform device_type
    FROM `sample.responses` resp
    LEFT JOIN `sample.forms` forms ON resp.answers.field.id  = forms.id
)

SELECT 
    customer_email,
    submitted_at,
    landed_at,
    title,
    unnested_answers_choices.label as answer_choices,
    score,
    true_false,
    device_type
FROM joined_table
CROSS JOIN UNNEST(answers_choices) AS unnested_answers_choices