Question

我有两张桌子：

为了简化问题定义，这里是1st的架构：

student_id int
phones repeated 
- phones.number string
- phones.type string

第二桌：

student_id int
courses repeated 
- courses.id int
- courses.name string

两个表都有相同的行数和相同的学生ID。我需要的是将两个重复记录组合成一个主学生表:(保留2个不同的重复字段）类似的东西：

student_id int
phones repeated 
- phones.number string
- phones.type string
courses repeated 
- courses.id int 
- courses.name string

我怎么能在bigquery中这样做？（我尝试了一些方法，但最终都为重复的字段创建了重复行。最好从stackquflow上的bigquery主人那里获得一个全新的视角）。提前致谢。

Answer 1

您需要JOIN这两个数据集并从中选择相关列。使用standard SQL设置示例更容易（取消选中“显示选项”下的“使用旧版SQL”），但类似的想法适用于旧版SQL。

WITH Students AS (
  SELECT
    1 AS student_id,
    ARRAY<STRUCT<number STRING, type STRING>>[
      STRUCT("(555) 555-5555", "cell")] AS phones
  UNION ALL SELECT
    5 AS student_id,
    ARRAY<STRUCT<number STRING, type STRING>>[
      STRUCT("(555) 555-1234", "home"),
      STRUCT("(555) 555-4321", "cell")] AS phones
),
Courses AS (
  SELECT
    5 AS student_id,
    ARRAY<STRUCT<id INT64, name STRING>>[
      STRUCT(10, "Data Analysis")] AS courses
  UNION ALL SELECT
    1 AS student_id,
    ARRAY<STRUCT<id INT64, name STRING>>[
      STRUCT(10, "Data Analysis"),
      STRUCT(101, "Algorithms")] AS courses
)
SELECT
  student_id,
  phones,
  courses
FROM Students
JOIN Courses
USING (student_id);

旧版SQL将使用类似：

的内容

SELECT
  s.student_id AS student_id,
  s.phones.number,
  s.phones.type,
  c.courses.id,
  c.courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id;

Answer 2

for Legacy SQL

SELECT 
   s.student_id AS student_id,
   phones.number,
   phones.type,
   courses.id,
   courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id

注意：您需要选中Allow Large Results复选框并取消选中Flatten Results复选框，并将结果保存到表格中以保留架构

Bigquery结合了来自2个不同表的重复字段

2 个答案: