在这个例子中,我有一个图书数据库,每本书有一个记录。记录包含图书所有者,流派和其他一些信息。我需要返回每个所有者,每种类型的前20名样本以及该行中的所有数据。
我有这段代码,它可以为行中的一个数据点(Data_one)提供所需的代码:
WITH `project.dataset.table` AS (
SELECT
Name name,
Genre genre,
Data_one org
FROM `project.dataset.booktable`
), search AS (
SELECT name, genre FROM
UNNEST(['Alex','James']) name,
UNNEST(['HORROR','COMEDY']) genre
)
SELECT name, genre, org
FROM (
SELECT t.name, t.genre, ARRAY_AGG(t.org LIMIT 20) orgs
FROM `project.dataset.table` t JOIN search s
ON LOWER(s.name) = LOWER(t.name)
AND LOWER(s.genre) = LOWER(t.genre)
WHERE RAND() < 0.5
GROUP BY t.name, t.genre
), UNNEST(orgs) org
ORDER BY name, genre, org
但是当我尝试将它扩展到行中的第二个(并且最终是相当多的)数据时,它会使返回的记录膨胀200倍:
WITH `project.dataset.table` AS (
SELECT
Name name,
Genre genre,
Data_one org,
Data_two org2
FROM `project.dataset.booktable`
), search AS (
SELECT name, genre FROM
UNNEST(['Alex','James']) name,
UNNEST(['HORROR','COMEDY']) genre
)
SELECT name, genre, org, org2
FROM (
SELECT t.name, t.genre, ARRAY_AGG(t.org LIMIT 20) orgs, ARRAY_AGG(t.org2 LIMIT 20) orgs2
FROM `project.dataset.table` t JOIN search s
ON LOWER(s.name) = LOWER(t.name)
AND LOWER(s.genre) = LOWER(t.genre)
WHERE RAND() < 0.5
GROUP BY t.name, t.genre
), UNNEST(orgs) org, UNNEST(orgs2) org2
ORDER BY name, genre, org, org2
我知道UNNEST将一个数组转换成一个表,但这是以某种方式创建一个数组的数组并且不需要它吗?我不熟悉语法。
编辑: 我想要得到的数据都在同一级别,所有单个数据点(没有数组)和混合的NULLABLE STRINGS,INTEGERS,TIMESTAMPS,FLOATS
E.G:
Genre STRING NULLABLE
Name STRING NULLABLE
Data_one STRING NULLABLE
Data_two STRING NULLABLE
Data_three INTEGER NULLABLE
Data_four TIMESTAMP NULLABLE
Owner | Genre | Data_one | Data_two |Data_three|Data_four
Alex | Horror | Stephen King | IT | 3 |2018-01-02
Alex | Sci-fi | Andy Weir |The Martian | 5 |2018-01-02
James | Horror | Bram Stoker | Dracula | 2 |2018-01-02
Sarah | Horror | Stephen King | The Stand | 3 |2018-01-02
James | Horror | Stephen King |Pet Sematary| 2 |2018-01-02
答案 0 :(得分:4)
因为您的问题泄漏了详细信息 - 以下答案只是您探索的方向
#standardSQL
SELECT name, genre, data_one, data_two FROM (
SELECT t.name, t.genre, ARRAY_AGG(t.org LIMIT 20) orgs, ARRAY_AGG(t.org2 LIMIT 20) orgs2
FROM `project.dataset.table` t JOIN search s
ON LOWER(s.name) = LOWER(t.name)
AND LOWER(s.genre) = LOWER(t.genre)
WHERE RAND() < 0.5
GROUP BY t.name, t.genre
), UNNEST(orgs) data_one WITH OFFSET pos1
, UNNEST(orgs2) data_two WITH OFFSET pos2
WHERE pos1 = pos2
ORDER BY name, genre, data_one
正如你所看到的 - 这里引入OFFSET来识别数组中元素的位置,然后只留下那些具有相同位置的组合
在实际使用案例中 - 您很可能还有一些字段可以识别哪个data_one和data_two属于同一行,并且该字段可用于配对data_one和data_two
希望这有助于你找到方向
更新
当您添加架构/示例时 - 请参阅下面的
#standardSQL
SELECT name, genre, data.data_one, data.data_two, data.data_three, data.data_four
FROM (
SELECT t.name, t.genre,
ARRAY_AGG(STRUCT(data_one, data_two, data_three, data_four) LIMIT 20) data
FROM `project.dataset.table` t JOIN search s
ON LOWER(s.name) = LOWER(t.name)
AND LOWER(s.genre) = LOWER(t.genre)
WHERE RAND() < 0.5
GROUP BY t.name, t.genre
), UNNEST(data) data
ORDER BY name, genre
这正是我在评论中提到的另一篇文章中的第一个相关问题(you can just use org.data_one, org.data_two in you select statement
)