我们在1月1日切换到新平台,我需要合并两个表以获取包含旧数据和新数据的数据源。但是,某些帐户必须在1月1日之前转出旧平台。
新数据表具有所有帐户12月的数据,但我只想使用没有旧12月数据的新12月数据。如何将新数据与从1月1日开始的大多数帐户数据以及从12月适当的一天开始的异常帐户合并在一起?
例如:对于Account1,我需要从1月1日开始的新数据;对于Account2,我需要12月30日以后的新数据;对于帐户3,我需要12月31日以后的新数据
Old Table
------------------------------------
Account Date Sales
------------------------------------
Account1 12-29-18 10
Account1 12-30-18 10
Account1 12-31-18 5
Account2 12-29-18 10
Account3 12-29-18 20
Account3 12-30-18 10
New Table
------------------------------------
Account Date Sales
------------------------------------
Account1 12-29-18 10
Account1 12-30-18 10
Account1 12-31-18 5
Account1 01-01-19 20
Account2 12-30-18 15
Account2 12-31-18 20
Account2 01-01-19 10
Account3 12-30-18 10
Account3 12-31-18 20
Account3 01-01-19 5
Output
------------------------------------
Account Date Sales
------------------------------------
Account1 12-29-18 10
Account1 12-30-18 10
Account1 12-31-18 5
Account1 01-01-19 20
Account2 12-29-18 10
Account2 12-30-18 15
Account2 12-31-18 20
Account2 01-01-19 10
Account3 12-29-18 20
Account3 12-30-18 10
Account3 12-31-18 20
Account3 01-01-19 5
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT account, date,
ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
FROM (
SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL
SELECT 'new' data, * FROM `project.dataset.new_table`
)
GROUP BY account, date
您可以使用问题中的示例数据作为
进行测试和操作 #standardSQL
WITH `project.dataset.old_table` AS (
SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL
SELECT 'Account1', '12-30-18', 10 UNION ALL
SELECT 'Account1', '12-31-18', 5 UNION ALL
SELECT 'Account2', '12-29-18', 10 UNION ALL
SELECT 'Account3', '12-29-18', 20 UNION ALL
SELECT 'Account3', '12-30-18', 10
), `project.dataset.new_table` AS (
SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL
SELECT 'Account1', '12-30-18', 10 UNION ALL
SELECT 'Account1', '12-31-18', 5 UNION ALL
SELECT 'Account1', '01-01-19', 20 UNION ALL
SELECT 'Account2', '12-30-18', 15 UNION ALL
SELECT 'Account2', '12-31-18', 20 UNION ALL
SELECT 'Account2', '01-01-19', 10 UNION ALL
SELECT 'Account3', '12-30-18', 10 UNION ALL
SELECT 'Account3', '12-31-18', 20 UNION ALL
SELECT 'Account3', '01-01-19', 5
)
SELECT account, date,
ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
FROM (
SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL
SELECT 'new' data, * FROM `project.dataset.new_table`
)
GROUP BY account, date
ORDER BY account, PARSE_DATE('%m-%d-%y', date)
有结果
Row account date sales
1 Account1 12-29-18 10
2 Account1 12-30-18 10
3 Account1 12-31-18 5
4 Account1 01-01-19 20
5 Account2 12-29-18 10
6 Account2 12-30-18 15
7 Account2 12-31-18 20
8 Account2 01-01-19 10
9 Account3 12-29-18 20
10 Account3 12-30-18 10
11 Account3 12-31-18 20
12 Account3 01-01-19 5