我有一个日期,频道和会话表,我正在尝试使用join命令为去年包含相关值的每一行添加列,但是,我想要包含去年没有的日期今年的价值,反之亦然。问题是,对于不存在的日期,我的行数会增加一倍。有关如何解决的任何想法?
SELECT
ty.*,
ly.Date as Date_LY,
ly.Sessions as Sessions_LY
FROM
`testjoin` AS ty
FULL JOIN
`testjoin` as ly
ON
ly.Date = DATE_SUB(ty.Date, INTERVAl 1 YEAR)
AND ly.Channel = ty.Channel
数据:
Date Channel Sessions
01/01/2017 Email 5
02/02/2017 Email 10
01/01/2018 Email 11
02/02/2018 Email 17
01/01/2017 Organic 10
02/02/2017 Organic 15
01/01/2018 Organic 20
期望的输出:
Date Channel Sessions Sessions_LY
01/01/2017 Email 5 null
02/02/2017 Email 10 null
01/01/2018 Email 11 5
02/02/2018 Email 17 10
01/01/2017 Organic 10 null
02/02/2017 Organic 15 null
01/01/2018 Organic 20 10
02/02/2018 Organic null 15
实际输出:
Date Channel Sessions Sessions_LY
01/01/2017 Organic 10
02/02/2017 Email 10
02/02/2017 Organic 15
01/01/2017 Email 5
01/01/2018 Email 11 5
01/01/2018 Organic 20 10
02/02/2018 Email 17 10
15
11
20
17
答案 0 :(得分:1)
我认为您希望cross join
生成行,并left join
来引入值:
SELECT d.Date, c.Channel, ty.Sessions, ty_prev.Sessions
FROM (SELECT DISTINCT ty.Date
FROM testjoin ty
) d CROSS JOIN
(SELECT DISTINCT ty.channel FROM testjoin ty) c LEFT JOIN
testjoin ty
ON ty.Date = d.Date AND ty.Channel = c.Channel LEFT JOIN
testjoin ty_prev
ON ty_prev.Date = d.date - interval 1 year and ty.Channel = c.Channel;
答案 1 :(得分:0)
根据您的需要使用datepart
with t (date, channel, sessions) as
(
select '01/01/2017', 'Email', 5 union all
select '02/02/2017', 'Email', 10 union all
select '01/01/2018', 'Email', 11 union all
select '02/02/2018', 'Email', 17 union all
select '01/01/2017', 'Organic', 10 union all
select '02/02/2017', 'Organic', 15 union all
select '01/01/2018', 'Organic', 20
)
select *, lag(sessions) over (partition by d.channel, datepart(mm, d.date) order by d.channel, datepart(mm, d.date)) l
from (select * from ((SELECT DISTINCT t.Date
FROM t) d
CROSS JOIN
(SELECT DISTINCT t.channel FROM t) c)) d left join t on d.Date = t.Date and d.channel = t.channel
order by d.channel, datepart(yyyy,d.date), datepart(mm, d.date)
答案 2 :(得分:0)
您问题中的所有内容都表明您只有当前(2018)和之前(2017),因此以下是基于此假设并且适用于BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT PARSE_DATE('%m/%d/%Y', Date) Date, Channel, Sessions
FROM `project.dataset.your_table`
), all_days AS (
SELECT Date, Channel FROM temp UNION DISTINCT
SELECT DATE_ADD(Date, INTERVAL 1 YEAR), Channel
FROM temp WHERE EXTRACT(YEAR FROM Date) = 2017
), all_data AS (
SELECT Date, Channel, Sessions, FORMAT_DATE('%m%d', Date) day
FROM all_days
LEFT JOIN temp USING(Date, Channel)
)
SELECT Date, Channel, Sessions,
LAG(Sessions) OVER(PARTITION BY day, Channel ORDER BY Date) Sessions_LY
FROM all_data
您可以使用您问题中的虚拟数据进行上述测试/播放,如下所示
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT '01/01/2017' Date, 'Email' Channel, 5 Sessions UNION ALL
SELECT '02/02/2017', 'Email', 10 UNION ALL
SELECT '01/01/2018', 'Email', 11 UNION ALL
SELECT '02/02/2018', 'Email', 17 UNION ALL
SELECT '01/01/2017', 'Organic', 10 UNION ALL
SELECT '02/02/2017', 'Organic', 15 UNION ALL
SELECT '01/01/2018', 'Organic', 20
), temp AS (
SELECT PARSE_DATE('%m/%d/%Y', Date) Date, Channel, Sessions
FROM `project.dataset.your_table`
), all_days AS (
SELECT Date, Channel FROM temp UNION DISTINCT
SELECT DATE_ADD(Date, INTERVAL 1 YEAR), Channel
FROM temp WHERE EXTRACT(YEAR FROM Date) = 2017
), all_data AS (
SELECT Date, Channel, Sessions, FORMAT_DATE('%m%d', Date) day
FROM all_days
LEFT JOIN temp USING(Date, Channel)
)
SELECT Date, Channel, Sessions,
LAG(Sessions) OVER(PARTITION BY day, Channel ORDER BY Date) Sessions_LY
FROM all_data
ORDER BY 2, 1
结果是
Row Date Channel Sessions Sessions_LY
1 2017-01-01 Email 5 null
2 2017-02-02 Email 10 null
3 2018-01-01 Email 11 5
4 2018-02-02 Email 17 10
5 2017-01-01 Organic 10 null
6 2017-02-02 Organic 15 null
7 2018-01-01 Organic 20 10
8 2018-02-02 Organic null 15