给出以下数据示例:
+---------------+--------+---------+------------+
| customer_id | city | spend | timestamp |
+---------------+--------+---------+------------+
| 1 | A | 0.7 | 2019-02-12 |
| 2 | B | 0.9 | 2019-02-12 |
| 3 | C | 0.8 | 2019-02-12 |
| 4 | B | 0.95 | 2019-02-12 |
+---------------+--------+---------+------------+
我想回答以下问题:每个城市每个客户每月平均花费多少?结果应如下所示:
+--------+---------+------------+
| city | avg | timestamp |
+--------+---------+------------+
| A | ... | 2019-02-12 |
| B | ... | 2019-02-12 |
| C | ... | 2019-02-12 |
+--------+---------+------------+
我试图用移动平均线来解决它:
SELECT
city,
AVG(spend) OVER (PARTITION BY customer_id ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) avg_spend,
date
FROM (
SELECT
customer_id,
city,
AVG(spend) spend,
date
FROM `project.dataset.table`
GROUP BY customer_id, city, date
)
ORDER BY date DESC
我正在获得avg_spend的(小)数字,似乎更像是每日平均值,而不是每月平均值。知道我的查询有什么问题吗?
答案 0 :(得分:0)
根据您想要的结果(分组或不分组)尝试其中之一
with
sample as (
select
*
from
unnest(
array[
struct(1 as customer_id, 'A' as city, 1000 as amount, timestamp'2019-02-12' as timestamp),
struct(1, 'A', 2000 , timestamp'2019-02-25'),
struct(1, 'A', 800, timestamp'2019-03-12'),
struct(1, 'B', 4500, timestamp'2019-03-10'),
struct(1, 'B', 500, timestamp'2019-03-14'),
struct(2, 'A', 1350, timestamp'2019-02-05'),
struct(2, 'A', 50, timestamp'2019-02-14'),
struct(3, 'B', 2000, timestamp'2019-04-02'),
struct(3, 'B', 4000, timestamp'2019-05-22')
]
)
)
select
customer_id,
city,
month,
avg(spent_by_day) as avg_amount_spent
from
( select
customer_id,
city,
date(timestamp) as date,
date_trunc(date(timestamp), month) as month,
sum(amount) as spent_by_day
from
sample
group by
1, 2, 3, 4)
group by
1, 2, 3
with
sample as (
select
*
from
unnest(
array[
struct(1 as customer_id, 'A' as city, 1000 as amount, timestamp'2019-02-12' as timestamp),
struct(1, 'A', 2000 , timestamp'2019-02-25'),
struct(1, 'A', 800, timestamp'2019-03-12'),
struct(1, 'B', 4500, timestamp'2019-03-10'),
struct(1, 'B', 500, timestamp'2019-03-14'),
struct(2, 'A', 1350, timestamp'2019-02-05'),
struct(2, 'A', 50, timestamp'2019-02-14'),
struct(3, 'B', 2000, timestamp'2019-04-02'),
struct(3, 'B', 4000, timestamp'2019-05-22')
]
)
)
select
customer_id,
city,
date,
avg(spent_by_day) over( partition by
customer_id,
city,
month) as avg_amount_spent
from
( select
customer_id,
city,
date(timestamp) as date,
date_trunc(date(timestamp), month) as month,
sum(amount) as spent_by_day
from
sample
group by
1, 2, 3, 4)on