我在Google Bigquery中使用标准SQL。因此,我有一些有关采用这种格式的指标的数据:
Date | metric_name | metric_level
01/02/2019 | metric_one | 1
02/03/2019 | metric_one | 2
14/02/2019 | metric_two | 6
17/02/2019 | metric_two | 4
01/03/2019 | metric_three | 2
10/03/2019 | metric_three | 7
我想以这种格式获取它,日期历史可以追溯到一年之前,然后为每个日期填写每个指标。如果某个指标在特定日期没有数据,那么它将使用最新的数据点:
Date | metric_one | metric_two | metric_three
..........
01/02/2019 | 1 | null | null
02/02/2019 | 1 | null | null
03/02/2019 | 1 | null | null
...........
...........
13/02/2019 | 1 | null | null
14/02/2019 | 1 | 6 | null
15/02/2019 | 1 | 6 | null
...........
...........
09/03/2019 | 2 | 4 | 2
10/03/2019 | 2 | 4 | 7
11/03/2019 | 2 | 4 | 7
...........
以此类推。
我已经设法编写了一些执行此操作的代码,但是我想知道是否有更有效的方法来执行此操作。实际上有3个以上的指标,因此,如果我可以以任何方式提高效率,那么从长远来看,它将节省大量资源。
这是我的代码
WITH date_arr AS(
SELECT
date
FROM UNNEST(
GENERATE_DATE_ARRAY(
DATE_SUB(CURRENT_DATE(),INTERVAL 365 DAY),
CURRENT_DATE(),
INTERVAL 1 day
)
) AS date
),
metric_one_raw AS (
SELECT
date,
metric_level
FROM database
WHERE metric_name = 'metric_one'
),
metric_one_gapless AS (
SELECT
d.date AS date,
IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_one
FROM date_arr d
LEFT JOIN metric_one_raw i
ON d.date = i.date
WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
),
metric_two_raw AS (
SELECT
date,
metric_level
FROM database
WHERE metric_name = 'metric_two'
),
metric_two_gapless AS (
SELECT
d.date AS date,
IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_two
FROM date_arr d
LEFT JOIN metric_two_raw i
ON d.date = i.date
WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
),
metric_three_raw AS (
SELECT
date,
metric_level
FROM database
WHERE metric_name = 'metric_three'
),
metric_three_gapless AS (
SELECT
d.date AS date,
IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_three
FROM date_arr d
LEFT JOIN metric_three_raw i
ON d.date = i.date
WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
)
SELECT
*
FROM metric_one_gapless
LEFT JOIN metric_two_gapless USING(date)
LEFT JOIN metric_three_gapless USING(date)
希望如此。预先感谢!
答案 0 :(得分:0)
您可以执行以下操作:
cross join
获取所有行left join
引入数据last_value()
填写NULL
值。在其他数据库中,我希望使用lag(ignore nulls)
,但BigQuery不支持。
所以:
select d, m.metric,
coalesce(mm.metric_level,
last_value(mm.metric_level ignore nulls) over (partition by m.metric order by d)
) as metric_level
from (select distinct metric from metrics) m cross join
unnest(gnerate_date_array(date_sub(current_date(), interval 1 year), interval 1 day) d left join
metrics mm
on mm.metric = m.metric and mm.date = d;
答案 1 :(得分:0)