我有两张桌子:
为了简化问题定义,这里是1st的架构:
student_id int
phones repeated
- phones.number string
- phones.type string
第二桌:
student_id int
courses repeated
- courses.id int
- courses.name string
两个表都有相同的行数和相同的学生ID。我需要的是将两个重复记录组合成一个主学生表:(保留2个不同的重复字段) 类似的东西:
student_id int
phones repeated
- phones.number string
- phones.type string
courses repeated
- courses.id int
- courses.name string
我怎么能在bigquery中这样做? (我尝试了一些方法,但最终都为重复的字段创建了重复行。最好从stackquflow上的bigquery主人那里获得一个全新的视角)。提前致谢。
答案 0 :(得分:2)
您需要JOIN
这两个数据集并从中选择相关列。使用standard SQL设置示例更容易(取消选中“显示选项”下的“使用旧版SQL”),但类似的想法适用于旧版SQL。
WITH Students AS (
SELECT
1 AS student_id,
ARRAY<STRUCT<number STRING, type STRING>>[
STRUCT("(555) 555-5555", "cell")] AS phones
UNION ALL SELECT
5 AS student_id,
ARRAY<STRUCT<number STRING, type STRING>>[
STRUCT("(555) 555-1234", "home"),
STRUCT("(555) 555-4321", "cell")] AS phones
),
Courses AS (
SELECT
5 AS student_id,
ARRAY<STRUCT<id INT64, name STRING>>[
STRUCT(10, "Data Analysis")] AS courses
UNION ALL SELECT
1 AS student_id,
ARRAY<STRUCT<id INT64, name STRING>>[
STRUCT(10, "Data Analysis"),
STRUCT(101, "Algorithms")] AS courses
)
SELECT
student_id,
phones,
courses
FROM Students
JOIN Courses
USING (student_id);
旧版SQL将使用类似:
的内容SELECT
s.student_id AS student_id,
s.phones.number,
s.phones.type,
c.courses.id,
c.courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id;
答案 1 :(得分:0)
for Legacy SQL
SELECT
s.student_id AS student_id,
phones.number,
phones.type,
courses.id,
courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id
注意:您需要选中Allow Large Results
复选框并取消选中Flatten Results
复选框,并将结果保存到表格中以保留架构