在BigQuery中,按分组方式计算两行之间的差异

时间:2020-08-28 08:14:00

标签: sql count google-bigquery sum pivot

with
  my_stats as (
    select 24996 as competitionId, 17 as playerId, 'on' as onOff, 8 as fga, 4 as fgm, 0.50 as fgPct union all
    select 24996 as competitionId, 17 as playerId, 'off' as onOff, 5 as fga, 3 as fgm, 0.60 as fgPct union all
    select 24996 as competitionId, 24 as playerId, 'on' as onOff, 9 as fga, 6 as fgm, 0.67 as fgPct union all
    select 24996 as competitionId, 24 as playerId, 'off' as onOff, 3 as fga, 1 as fgm, 0.33 as fgPct union all
    select 24996 as competitionId, 27 as playerId, 'on' as onOff, 5 as fga, 4 as fgm, 0.8 as fgPct
  ),
  
  my_output as (
    select 24996 as competitionId, 17 as playerId, 'diff' as onOff, 3 as fga, 1 as fgm, -0.1 as fgPct union all
    select 24996 as competitionId, 24 as playerId, 'diff' as onOff, 6 as fga, 5 as fgm, 0.34 as fgPct
  )
  

select * from my_stats
select * from my_output

这是一个简单的示例,用来说明我们正在努力解决的问题。我们有表my_stats,其中主键是competitionId, playerId, onOff的组合,而onOff列只能是“ on”或“ off”。然后,对于单个competitionId, playerId(有两行,一个为“ on”,一个为“ off”),我们想从所有其他列中减去值(on-off)。

希望my_output表清楚说明我们为此需要的输出。对于playerId = 27,由于此播放器没有“关闭”行,因此由于无需进行计算,因此可以将它们从输出中删除。

3 个答案:

答案 0 :(得分:1)

您可以进行条件聚合:

select
    competitionId,
    playerId,
    'diff' as onOff,
    sum(case when onOff = 'on' then fga   else - fga   end) fga,
    sum(case when onOff = 'on' then fgm   else - fgm   end) fga,
    sum(case when onOff = 'on' then fgpct else - fgpct end) fgpct
from my_stats
where onOff in ('on', 'off')
group by competitionId, playerId
having count(*) = 2

这将按比赛和玩家对数据进行分组,然后条件sum()计算每一列的“ on”和“ off”值之间的差。 having子句过滤掉没有两个记录都可用的组。

答案 1 :(得分:1)

基于自联接的另一种解决方案:

select
    t1.competitionId,
    t1.playerId,
    'diff' as onOff,
    t1.fga - t2.fga as fga,
    t1.fgm - t2.fgm as fgm,
    t1.fgpct - t2.fgpct as fgpct
from my_stats as t1
join my_stats as t2
  on t1.competitionId = t2.competitionId
 and t1.playerId = t2.playerId
where t1.onOff = 'on'
  and t2.onOff = 'off'

您应该检查哪种方式更有效

答案 2 :(得分:0)

以下是用于BigQuery标准SQL

#standardSQL
SELECT competitionId, playerId, 'diff' AS onOff,
  SUM(onOffSign * fga) AS fga,
  SUM(onOffSign * fgm) AS fgm,
  SUM(onOffSign * fgPct) AS fgPct  
FROM my_stats, 
  UNNEST([IF(onOff = 'on', 1, -1)]) onOffSign
GROUP BY competitionId, playerId
HAVING COUNT(1) = 2  
相关问题