通过BigQuery中的联接构建多层嵌套结构(使用嵌套和重复字段)

时间:2019-07-06 09:45:04

标签: sql json struct nested google-bigquery

我在BigQuery中有许多平面表,我想将它们连接到一个表中,该表利用不同级别(此处为3个,但将来可能会有更多级别)的嵌套和重复字段。

根据docs / videos中的技术,我已经可以在单个级别上执行此操作,但是我似乎无法正确地为多个级别使用语法。

#dummy data to demonstrate hierarchy (travellers->cities->places)

WITH 
travellers AS (
SELECT 'Jim' as traveller, 'England' as country UNION ALL
SELECT 'Jim' as traveller, 'Spain' as country UNION ALL
SELECT 'Jill' as traveller, 'France' as country),

cities AS (
SELECT 'England' as country, 'London' as city UNION ALL
SELECT 'England' as country, 'Liverpool' as city UNION ALL
SELECT 'England' as country, 'Manchester' as city  UNION ALL
SELECT 'France' as country, 'Paris' as city UNION ALL
SELECT 'France' as country, 'Nantes' as city UNION ALL
SELECT 'France' as country, 'Marseille' as city  UNION ALL
SELECT 'Spain' as country, 'Granada' as city UNION ALL
SELECT 'Spain' as country, 'Barcelona' as city UNION ALL
SELECT 'Spain' as country, 'Madrid' as city),

places AS (
SELECT 'London' as city, 'Buckingham Palace' as place UNION ALL
SELECT 'London' as city, 'Tooting Bec Lido' as place UNION ALL
SELECT 'Liverpool' as city, 'The Liver Building' as place  UNION ALL
SELECT 'Manchester' as city, 'Old Trafford' as place UNION ALL
SELECT 'Paris' as city, 'Notre Dame' as place UNION ALL
SELECT 'Paris' as city, 'Louvre' as place  UNION ALL
SELECT 'Nantes' as city, 'La Machine' as place UNION ALL
SELECT 'Marseille' as city, 'Le Stade' as place UNION ALL
SELECT 'Granada' as city, 'Alhambra' as place UNION ALL
SELECT 'Granada' as city, 'El Bar de Fede' as place UNION ALL
SELECT 'Barcelona' as city, 'Camp Nou' as place UNION ALL
SELECT 'Madrid' as city, 'Sofia Reina' as place UNION ALL
SELECT 'Madrid' as city, 'El Bar de Edu' as place UNION ALL
SELECT 'Barcelona' as city, 'La Playa' as place UNION ALL
SELECT 'Granada' as city, 'Cafe Andarax' as place),

# full table using typical join (not what I wnat)
full_array_flat as (SELECT * FROM travellers LEFT JOIN cities USING(country) LEFT JOIN places USING(city)),

# simple nesting at a single level (using STRUCT as I will need multiple levels in future, and will need to include additional fields of different types)
travellers_nested AS (SELECT traveller, ARRAY_AGG(STRUCT (country)) as country_array FROM travellers GROUP BY traveller),
cities_nested AS (SELECT country, ARRAY_AGG(STRUCT (city)) as city_array FROM cities GROUP BY country),
places_nested AS (SELECT city, ARRAY_AGG(STRUCT (place)) as place_array FROM places GROUP BY city),

# flattening nested arrays just for fun (!)... trying to test out different combinations
travellers_nested_flattened AS (SELECT traveller, country_flat from travellers_nested, UNNEST(country_array) as country_flat),
cities_nested_flattened AS (SELECT country, city_flat from cities_nested, UNNEST(city_array) as city_flat),
places_nested_flattened AS (SELECT city, place_flat from places_nested, UNNEST(place_array) as place_flat)

# SELECT * FROM travellers_cities_places 
SELECT "WHY OH WHY CAN'T I FIGURE THIS OUT, PLEASE HELP ME SOMEBODY!)" AS cry_for_help 

预期输出的JSON表示形式是

[
  {
    "traveller": "Jim",
    "country_array": [
      {
        "country": "England",
        "city_array": [
          {
            "city": "London",
            "place_array": [
              {
                "place": "Buckingham Palace"
              },
              {
                "place": "Tooting Bec Lido"
              }
            ] ...

但是,ARRAY,STRUCT,UNNESTing或JOINing的组合似乎无法使我获得类似于此输出的任何信息...请帮助。谢谢。

1 个答案:

答案 0 :(得分:0)

建立所需的结构,一次聚合一次。

然后将结果转换为字符串:

SELECT TO_JSON_STRING(STRUCT(traveller,
                             ARRAY_AGG(STRUCT(country, city_array)) as country_array
                            )
                     )
FROM (SELECT traveller, country,
             ARRAY_AGG(STRUCT(city, place_array)) as city_array
      FROM (SELECT t.traveller, t.country, c.city, ARRAY_AGG(p.place) as place_array
            FROM travellers t JOIN
                 cities c
                 ON t.country = c.country JOIN
                 places p
                 ON c.city = p.city
            GROUP BY t.traveller, t.country, c.city
           ) tcc
      GROUP BY traveller, country
     ) tc
GROUP BY traveller;