Given I have multiple tables in BigQuery, hence I have multiple SQL-statements that gives me "the number of X per day". For example:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as installs
FROM database.table1
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | installs |
-------------------------
| 2017-01-01 | 11 |
| 2017-01-02 | 22 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as uninstalls
FROM database.table2
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | uninstalls |
---------------------------
| 2017-01-02 | 22 |
| 2017-01-03 | 33 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as cases
FROM database.table3
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | cases |
----------------------
| 2017-01-01 | 11 |
| 2017-01-03 | 33 |
etc
etc
Now I need to combine all these into a single SELECT
statement that gives the following results:
| day | installs | uninstalls | cases |
----------------------------------------------
| 2017-01-01 | 11 | 0 | 11 |
| 2017-01-02 | 22 | 22 | 0 |
| 2017-01-03 | 0 | 33 | 33 |
etc
Is this even possible?
Or what's the closest SQL-statement I can write that would give me a similar result?
Any feedback is appreciated!
答案 0 :(得分:2)
这是一个自包含的示例,可能有助于您入门。它使用两个虚拟表InstallEvents
和UninstallEvents
,其中包含相应操作的时间戳。它创建一个名为StartAndEnd
的公用表表达式,用于计算这些事件的最小和最大日期,以便确定要聚合的日期,然后联合InstallEvents
和UninstallEvents
的内容,计算每一天的事件。
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
),
StartAndEnd AS (
SELECT MIN(DATE(timestamp)) AS min_date, MAX(DATE(timestamp)) AS max_date
FROM (
SELECT * FROM InstallEvents UNION ALL
SELECT * FROM UninstallEvents
)
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY(
(SELECT min_date FROM StartAndEnd),
(SELECT max_date FROM StartAndEnd)
)) AS day
GROUP BY day
ORDER BY day;
如果您事先了解开始日期和结束日期,则可以在查询中对其进行硬编码,然后省略StartAndEnd
CTE:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-04')) AS day
GROUP BY day
ORDER BY day;
要查看示例数据中的事件,请使用联合内容的查询:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT timestamp, true AS is_install
FROM InstallEvents UNION ALL
SELECT timestamp, false
FROM UninstallEvents;
答案 1 :(得分:1)
以下是BigQuery Standard SQL
#standardSQL
WITH calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
-- ORDER BY day
请注意:您使用时间戳作为列名称,这不是最佳做法,因为它是关键字,因此在我的示例中,我保留您的命名,但请考虑更改此名称!
您可以使用以下虚拟数据
测试/播放此解决方案#standardSQL
WITH `database.table1` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs
UNION ALL SELECT TIMESTAMP '2017-01-01', 22
),
`database.table2` AS (
SELECT TIMESTAMP '2016-12-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
`database.table3` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-10', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
ORDER BY day
答案 2 :(得分:0)
我对bigquery不太熟悉,所以这可能不是复制粘贴的答案。
您首先必须构建一个压缩表,以确保您拥有所有日期。 Here's sql server的一个例子。也许有大型查询的例子。以下假定Calander
中包含Date
属性的timestamp
表。
一旦你有了calander表,你可以将所有表加入到:
SELECT FORMAT_TIMESTAMP("%F",C.Date) AS day
, COUNT(T1.DATE(T1.TIMESTAMP)) AS installs --Here you could also use your FORMAT_TIMESTAMP
, COUNT(T1.DATE(T2.TIMESTAMP)) AS uninstalls
FROM Calander C
LEFT JOIN database.table1 T1
ON DATE(T1.TIMESTAMP) = DATE(C.Date) --Convert to date to remove times, you could also use your FORMAT_TIMESTAMP
LEFT JOIN database.table2 T2
ON DATE(T2.TIMESTAMP) = DATE(C.Date)
GROUP BY day
ORDER BY day ASC