在缺少日期的情况下将表连接到自己的去年结果

时间:2018-05-15 17:49:59

标签: sql join google-bigquery standard-sql

我有一个日期,频道和会话表,我正在尝试使用join命令为去年包含相关值的每一行添加列,但是,我想要包含去年没有的日期今年的价值,反之亦然。问题是,对于不存在的日期,我的行数会增加一倍。有关如何解决的任何想法?

SELECT
  ty.*,
  ly.Date as Date_LY,
  ly.Sessions as Sessions_LY
FROM
  `testjoin` AS ty
FULL JOIN
  `testjoin` as ly
  ON
  ly.Date = DATE_SUB(ty.Date, INTERVAl 1 YEAR)
  AND ly.Channel = ty.Channel

数据:

Date        Channel Sessions
01/01/2017  Email   5
02/02/2017  Email   10
01/01/2018  Email   11
02/02/2018  Email   17
01/01/2017  Organic 10
02/02/2017  Organic 15
01/01/2018  Organic 20

期望的输出:

Date    Channel Sessions    Sessions_LY
01/01/2017  Email   5   null
02/02/2017  Email   10  null
01/01/2018  Email   11  5
02/02/2018  Email   17  10
01/01/2017  Organic 10  null
02/02/2017  Organic 15  null
01/01/2018  Organic 20  10
02/02/2018  Organic null    15

实际输出:

Date        Channel Sessions    Sessions_LY
01/01/2017  Organic 10  
02/02/2017  Email   10  
02/02/2017  Organic 15  
01/01/2017  Email   5   
01/01/2018  Email   11  5
01/01/2018  Organic 20  10
02/02/2018  Email   17  10
                        15
                        11
                        20
                        17

3 个答案:

答案 0 :(得分:1)

我认为您希望cross join生成行,并left join来引入值:

SELECT d.Date, c.Channel, ty.Sessions, ty_prev.Sessions
FROM (SELECT DISTINCT ty.Date
      FROM testjoin ty
     ) d CROSS JOIN
     (SELECT DISTINCT ty.channel FROM testjoin ty) c LEFT JOIN
     testjoin ty
     ON ty.Date = d.Date AND ty.Channel = c.Channel LEFT JOIN
     testjoin ty_prev
     ON ty_prev.Date = d.date - interval 1 year and ty.Channel = c.Channel;

答案 1 :(得分:0)

根据您的需要使用datepart

with t (date, channel, sessions) as ( select '01/01/2017', 'Email', 5 union all select '02/02/2017', 'Email', 10 union all select '01/01/2018', 'Email', 11 union all select '02/02/2018', 'Email', 17 union all select '01/01/2017', 'Organic', 10 union all select '02/02/2017', 'Organic', 15 union all select '01/01/2018', 'Organic', 20 ) select *, lag(sessions) over (partition by d.channel, datepart(mm, d.date) order by d.channel, datepart(mm, d.date)) l from (select * from ((SELECT DISTINCT t.Date FROM t) d CROSS JOIN (SELECT DISTINCT t.channel FROM t) c)) d left join t on d.Date = t.Date and d.channel = t.channel order by d.channel, datepart(yyyy,d.date), datepart(mm, d.date)

答案 2 :(得分:0)

您问题中的所有内容都表明您只有当前(2018)和之前(2017),因此以下是基于此假设并且适用于BigQuery Standard SQL

    
#standardSQL
WITH temp AS (
  SELECT PARSE_DATE('%m/%d/%Y', Date) Date, Channel, Sessions
  FROM `project.dataset.your_table` 
), all_days AS ( 
  SELECT Date, Channel FROM temp UNION DISTINCT
  SELECT DATE_ADD(Date, INTERVAL 1 YEAR), Channel 
    FROM temp WHERE EXTRACT(YEAR FROM Date) = 2017
), all_data AS (
  SELECT Date, Channel, Sessions, FORMAT_DATE('%m%d', Date) day
  FROM all_days
  LEFT JOIN temp USING(Date, Channel)
)
SELECT Date, Channel, Sessions, 
  LAG(Sessions) OVER(PARTITION BY day, Channel ORDER BY Date) Sessions_LY
FROM all_data

您可以使用您问题中的虚拟数据进行上述测试/播放,如下所示

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT '01/01/2017' Date, 'Email' Channel, 5 Sessions UNION ALL
  SELECT '02/02/2017', 'Email', 10 UNION ALL
  SELECT '01/01/2018', 'Email', 11 UNION ALL
  SELECT '02/02/2018', 'Email', 17 UNION ALL
  SELECT '01/01/2017', 'Organic', 10 UNION ALL
  SELECT '02/02/2017', 'Organic', 15 UNION ALL
  SELECT '01/01/2018', 'Organic', 20 
), temp AS (
  SELECT PARSE_DATE('%m/%d/%Y', Date) Date, Channel, Sessions
  FROM `project.dataset.your_table` 
), all_days AS ( 
  SELECT Date, Channel FROM temp UNION DISTINCT
  SELECT DATE_ADD(Date, INTERVAL 1 YEAR), Channel 
    FROM temp WHERE EXTRACT(YEAR FROM Date) = 2017
), all_data AS (
  SELECT Date, Channel, Sessions, FORMAT_DATE('%m%d', Date) day
  FROM all_days
  LEFT JOIN temp USING(Date, Channel)
)
SELECT Date, Channel, Sessions, 
  LAG(Sessions) OVER(PARTITION BY day, Channel ORDER BY Date) Sessions_LY
FROM all_data
ORDER BY 2, 1   

结果是

Row Date        Channel     Sessions    Sessions_LY  
1   2017-01-01  Email       5           null     
2   2017-02-02  Email       10          null     
3   2018-01-01  Email       11          5    
4   2018-02-02  Email       17          10   
5   2017-01-01  Organic     10          null     
6   2017-02-02  Organic     15          null     
7   2018-01-01  Organic     20          10   
8   2018-02-02  Organic     null        15