如何计算BigQuery中的每月移动平均值?

时间:2019-07-19 19:38:46

标签: google-bigquery

给出以下数据示例:

+---------------+--------+---------+------------+
|  customer_id  |  city  |  spend  | timestamp  |
+---------------+--------+---------+------------+
| 1             | A      |  0.7    | 2019-02-12 |
| 2             | B      |  0.9    | 2019-02-12 |
| 3             | C      |  0.8    | 2019-02-12 |
| 4             | B      |  0.95   | 2019-02-12 |
+---------------+--------+---------+------------+

我想回答以下问题:每个城市每个客户每月平均花费多少?结果应如下所示:

+--------+---------+------------+
|  city  |   avg   | timestamp  |
+--------+---------+------------+
| A      |  ...    | 2019-02-12 |
| B      |  ...    | 2019-02-12 |
| C      |  ...    | 2019-02-12 |
+--------+---------+------------+

我试图用移动平均线来解决它:

SELECT
  city,
  AVG(spend) OVER (PARTITION BY customer_id ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) avg_spend,
  date
FROM (
  SELECT 
    customer_id,
    city,
    AVG(spend) spend,
    date
  FROM `project.dataset.table`
  GROUP BY customer_id, city, date
)
ORDER BY date DESC

我正在获得avg_spend的(小)数字,似乎更像是每日平均值,而不是每月平均值。知道我的查询有什么问题吗?

1 个答案:

答案 0 :(得分:0)

根据您想要的结果(分组或不分组)尝试其中之一


with

sample as (
    select
        *
    from
        unnest(
            array[
                struct(1 as customer_id, 'A' as city, 1000 as amount, timestamp'2019-02-12' as timestamp),
                struct(1, 'A', 2000 , timestamp'2019-02-25'),
                struct(1, 'A',  800, timestamp'2019-03-12'),
                struct(1, 'B', 4500, timestamp'2019-03-10'),
                struct(1, 'B',  500, timestamp'2019-03-14'),

                struct(2, 'A', 1350, timestamp'2019-02-05'),
                struct(2, 'A',   50, timestamp'2019-02-14'),

                struct(3, 'B', 2000, timestamp'2019-04-02'),
                struct(3, 'B', 4000, timestamp'2019-05-22')
            ]
        )
)

select
    customer_id,
    city,
    month,
    avg(spent_by_day) as avg_amount_spent
from
    (   select
            customer_id,
            city,
            date(timestamp) as date,
            date_trunc(date(timestamp), month) as month,
            sum(amount) as spent_by_day
        from
            sample
        group by
            1, 2, 3, 4)
group by
    1, 2, 3

with

sample as (
    select
        *
    from
        unnest(
            array[
                struct(1 as customer_id, 'A' as city, 1000 as amount, timestamp'2019-02-12' as timestamp),
                struct(1, 'A', 2000 , timestamp'2019-02-25'),
                struct(1, 'A',  800, timestamp'2019-03-12'),
                struct(1, 'B', 4500, timestamp'2019-03-10'),
                struct(1, 'B',  500, timestamp'2019-03-14'),

                struct(2, 'A', 1350, timestamp'2019-02-05'),
                struct(2, 'A',   50, timestamp'2019-02-14'),

                struct(3, 'B', 2000, timestamp'2019-04-02'),
                struct(3, 'B', 4000, timestamp'2019-05-22')
            ]
        )
)

select
    customer_id,
    city,
    date,
    avg(spent_by_day) over( partition by
                                customer_id,
                                city,
                                month) as avg_amount_spent
from
    (   select
            customer_id,
            city,
            date(timestamp) as date,
            date_trunc(date(timestamp), month) as month,
            sum(amount) as spent_by_day
        from
            sample
        group by
            1, 2, 3, 4)on