按名称划分行,然后将每个名称转换为一列的最有效方法

时间:2019-06-06 11:13:32

标签: sql google-bigquery standard-sql

我在Google Bigquery中使用标准SQL。因此,我有一些有关采用这种格式的指标的数据:

Date        | metric_name  | metric_level
01/02/2019  | metric_one   | 1
02/03/2019  | metric_one   | 2
14/02/2019  | metric_two   | 6
17/02/2019  | metric_two   | 4
01/03/2019  | metric_three | 2
10/03/2019  | metric_three | 7

我想以这种格式获取它,日期历史可以追溯到一年之前,然后为每个日期填写每个指标。如果某个指标在特定日期没有数据,那么它将使用最新的数据点:

Date        | metric_one   | metric_two   | metric_three
..........
01/02/2019  | 1            | null         | null
02/02/2019  | 1            | null         | null
03/02/2019  | 1            | null         | null
...........
...........
13/02/2019  | 1            | null         | null
14/02/2019  | 1            | 6            | null
15/02/2019  | 1            | 6            | null
...........
...........
09/03/2019  | 2            | 4            | 2
10/03/2019  | 2            | 4            | 7
11/03/2019  | 2            | 4            | 7
...........

以此类推。

我已经设法编写了一些执行此操作的代码,但是我想知道是否有更有效的方法来执行此操作。实际上有3个以上的指标,因此,如果我可以以任何方式提高效率,那么从长远来看,它将节省大量资源。

这是我的代码

    WITH date_arr AS(

        SELECT 
        date

        FROM UNNEST(
            GENERATE_DATE_ARRAY(
                DATE_SUB(CURRENT_DATE(),INTERVAL 365 DAY), 
                CURRENT_DATE(), 
                INTERVAL 1 day
            )
        ) AS date

    ),

    metric_one_raw AS (

        SELECT 
        date,
        metric_level

        FROM database
        WHERE metric_name = 'metric_one'

    ),

    metric_one_gapless AS (

        SELECT
        d.date AS date,
        IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_one

        FROM date_arr d
        LEFT JOIN metric_one_raw i
        ON d.date = i.date
        WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    ),

    metric_two_raw AS (

        SELECT 
        date,
        metric_level

        FROM database
        WHERE metric_name = 'metric_two'

    ),

    metric_two_gapless AS (

        SELECT
        d.date AS date,
        IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_two

        FROM date_arr d
        LEFT JOIN metric_two_raw i
        ON d.date = i.date
        WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    ),

    metric_three_raw AS (

        SELECT 
        date,
        metric_level

        FROM database
        WHERE metric_name = 'metric_three'

    ),

    metric_three_gapless AS (

        SELECT
        d.date AS date,
        IFNULL(metric_level, LAST_VALUE(metric_level IGNORE NULLS) OVER(window_latest)) AS metric_three

        FROM date_arr d
        LEFT JOIN metric_three_raw i
        ON d.date = i.date
        WINDOW window_latest AS (ORDER BY d.date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    )

    SELECT
    *
    FROM metric_one_gapless
    LEFT JOIN metric_two_gapless USING(date)
    LEFT JOIN metric_three_gapless USING(date)

希望如此。预先感谢!

2 个答案:

答案 0 :(得分:0)

您可以执行以下操作:

  • 生成日期
  • 使用cross join获取所有行
  • 使用left join引入数据
  • 使用last_value()填写NULL值。

在其他数据库中,我希望使用lag(ignore nulls),但BigQuery不支持。

所以:

select d, m.metric,
       coalesce(mm.metric_level,
                last_value(mm.metric_level ignore nulls) over (partition by m.metric order by d)
               ) as metric_level
from (select distinct metric from metrics) m cross join
     unnest(gnerate_date_array(date_sub(current_date(), interval 1 year), interval 1 day) d left join
     metrics mm
     on mm.metric = m.metric and mm.date = d;

答案 1 :(得分:0)

做完一些研究后,我想到了somethig,因为您使用的是左联接,并且会有不止一个,甚至是可变数目的左联接,而且您不能在BigQuery中使用declare Web UI,您可能需要更好地使用API​​ Rest BigQuery feature,可以找到here依赖性,可以使用C#,GO,JAVA,NODE.JS,PHP,PYTHON或RUBY编码,这样您就可以将一个指标的数量分配给一个变量,因此我建议先做一个选择,以了解有多少指标,然后再将它们保存到变量中,然后循环执行该指标。您想要的左连接。

希望这些信息对您有所帮助,如果您需要更多信息,请在这里。