将行转换为列-Bigquery

时间:2019-10-08 12:54:54

标签: google-bigquery

我有一个如下所示的表格

enter image description here

如图所示,同一主题有两行。每行表示一天

但是,我希望将它们转换为如下所示的单行

enter image description here

可以帮忙吗?我确实检查了这个post,但无法翻译?

1 个答案:

答案 0 :(得分:2)

  

我确实检查了此post,但无法翻译?

让我们首先将您的原始数据转换为可转换的形式

以下操作:

#standardSQL
SELECT subject_id, hm_id, icu_id, balance, 
  DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table` 
-- ORDER BY subject_id, hm_id, icu_id, delta

如果要应用于您的问题的样本数据-结果为

Row subject_id  hm_id   icu_id  balance delta    
1   124         ab      cd      2       1    
2   124         ab      cd      5       2    
3   321         xy      pq      -6      1    
4   321         xy      pq      1       2     

因此,现在我们需要基于delta列进行透视-delta = 1的余额将进入day_1_balance,delta = 2的余额将进入day_2_balance,依此类推

现在让我们假设只有两个增量(如您的示例数据中所示)。在这种简化的情况下-下面将提供技巧

#standardSQL
SELECT subject_id, hm_id, icu_id,
  MAX(IF(delta = 1, balance, NULL)) day_1_balance,
  MAX(IF(delta = 2, balance, NULL)) day_2_balance  
FROM (
  SELECT subject_id, hm_id, icu_id, balance, 
    DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
  FROM `project.dataset.table` 
)
GROUP BY subject_id, hm_id, icu_id
-- ORDER BY subject_id, hm_id, icu_id

有结果

Row subject_id  hm_id   icu_id  day_1_balance   day_2_balance    
1   124         ab      cd      2               5    
2   321         xy      pq      -6              1      

很明显,在实际情况下,您不知道有多少个增量列,因此您需要动态地构建上述查询-这正是您引用的post会为您提供帮助

您可以自己重试-或参见下面的最终解决方案

第1步-生成查询

#standardSQL
WITH temp AS (
  SELECT subject_id, hm_id, icu_id, balance, 
    DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
  FROM `project.dataset.table` 
)
SELECT CONCAT('SELECT subject_id, hm_id, icu_id,', 
   STRING_AGG(
      CONCAT(' MAX(IF(delta = ',CAST(delta AS STRING),', balance, NULL)) as day_',CAST(delta AS STRING),'_balance')
   ) 
   ,' FROM temp GROUP BY subject_id, hm_id, icu_id ORDER BY subject_id, hm_id, icu_id')
FROM (
  SELECT delta 
  FROM temp
  GROUP BY delta
  ORDER BY delta
) 

步骤1的结果是代表您需要在步骤2中运行的最终查询的文本

第2步-运行生成的查询

#standardSQL
WITH temp AS (
  SELECT subject_id, hm_id, icu_id, balance, 
    DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
  FROM `project.dataset.table` 
)
SELECT subject_id, hm_id, icu_id, 
  MAX(IF(delta = 1, balance, NULL)) AS day_1_balance, 
  MAX(IF(delta = 2, balance, NULL)) AS day_2_balance 
FROM temp 
GROUP BY subject_id, hm_id, icu_id 
-- ORDER BY subject_id, hm_id, icu_id