Bigquery结合了来自2个不同表的重复字段

时间:2016-08-15 18:44:50

标签: google-bigquery

我有两张桌子:

为了简化问题定义,这里是1st的架构:

student_id int
phones repeated 
- phones.number string
- phones.type string

第二桌:

student_id int
courses repeated 
- courses.id int
- courses.name string

两个表都有相同的行数和相同的学生ID。我需要的是将两个重复记录组合成一个主学生表:(保留2个不同的重复字段) 类似的东西:

student_id int
phones repeated 
- phones.number string
- phones.type string
courses repeated 
- courses.id int 
- courses.name string

我怎么能在bigquery中这样做? (我尝试了一些方法,但最终都为重复的字段创建了重复行。最好从stackquflow上的bigquery主人那里获得一个全新的视角)。提前致谢。

2 个答案:

答案 0 :(得分:2)

您需要JOIN这两个数据集并从中选择相关列。使用standard SQL设置示例更容易(取消选中“显示选项”下的“使用旧版SQL”),但类似的想法适用于旧版SQL。

WITH Students AS (
  SELECT
    1 AS student_id,
    ARRAY<STRUCT<number STRING, type STRING>>[
      STRUCT("(555) 555-5555", "cell")] AS phones
  UNION ALL SELECT
    5 AS student_id,
    ARRAY<STRUCT<number STRING, type STRING>>[
      STRUCT("(555) 555-1234", "home"),
      STRUCT("(555) 555-4321", "cell")] AS phones
),
Courses AS (
  SELECT
    5 AS student_id,
    ARRAY<STRUCT<id INT64, name STRING>>[
      STRUCT(10, "Data Analysis")] AS courses
  UNION ALL SELECT
    1 AS student_id,
    ARRAY<STRUCT<id INT64, name STRING>>[
      STRUCT(10, "Data Analysis"),
      STRUCT(101, "Algorithms")] AS courses
)
SELECT
  student_id,
  phones,
  courses
FROM Students
JOIN Courses
USING (student_id);

旧版SQL将使用类似:

的内容
SELECT
  s.student_id AS student_id,
  s.phones.number,
  s.phones.type,
  c.courses.id,
  c.courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id;

答案 1 :(得分:0)

for Legacy SQL

SELECT 
   s.student_id AS student_id,
   phones.number,
   phones.type,
   courses.id,
   courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id

注意:您需要选中Allow Large Results复选框并取消选中Flatten Results复选框,并将结果保存到表格中以保留架构