BigQuery的主管和分析功能

时间:2018-07-03 11:09:37

标签: sql performance join google-bigquery state

假设我的桌子是这个

enter image description here

我正在尝试使用此信息修改表

enter image description here

我添加了两列,其中WhenWasLastBasicSubjectDone列将让您知道学生在哪个学期完成了最新的基础课程(按学期排序)。另一列TotalBasicSubjectsDoneTillNow解释了该学生到目前为止完成了多少次基础课程(学科)(按学期排序)?

我认为这很容易通过Joins和UDF解决,但是我想利用BigQuery中现有分析功能的强大功能,并在不使用joins的情况下解决它。

2 个答案:

答案 0 :(得分:1)

您可以为此使用窗口函数-假设您有一列指定排序的列。让我假设列为semester

select t.*,
       max( case when subject = 'Basic' then semester end ) over (partition by student order by semester end) as lastbasic,
       sum( case when subject = 'Basic' then 1 else 0 end ) over (partition by student order by semester end) as numbasictillnow    
from t

答案 1 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
SELECT *,
  LAST_VALUE(IF(subject='Basic',semester,NULL) IGNORE NULLS) OVER(win) AS WhenWasLastBasicSubjectDone ,
  COUNTIF(subject='Basic') OVER(win) AS TotalBasicSubjectsDoneTillNow     
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY student ORDER BY semester)

您可以使用下面的问题中的虚拟数据进行测试,操作

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 Student, 'Sub1' Subject, 'Sem1' Semester UNION ALL
  SELECT 1, 'Sub2', 'Sem2' UNION ALL
  SELECT 1, 'Basic', 'Sem3' UNION ALL
  SELECT 1, 'Basic', 'Sem4' UNION ALL
  SELECT 1, 'Sub3', 'Sem5' UNION ALL
  SELECT 1, 'Sub2', 'Sem6' UNION ALL
  SELECT 1, 'Sub3', 'Sem7' UNION ALL
  SELECT 1, 'Sub4', 'Sem8' 
)
SELECT *,
  LAST_VALUE(IF(subject='Basic',semester,NULL) IGNORE NULLS) OVER(win) AS WhenWasLastBasicSubjectDone ,
  COUNTIF(subject='Basic') OVER(win) AS TotalBasicSubjectsDoneTillNow     
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY student ORDER BY semester)
-- ORDER BY Semester