我有一个大型数据库,其中包含以下信息:
date sku sales
2019-02-13 123 10
2019-02-14 123 10
2019-02-15 123 10
2019-02-16 123 10
2019-02-17 123 10
2019-02-18 123 10
2019-02-19 123 10
2019-02-20 123 10
2019-02-21 456 10
2019-02-22 456 10
我想查询表,但是每隔7天使用GROUP BY,所以我会得到:
begin_date sku sales week
2019-02-13 123 70 1
2019-02-20 123 10 2
2019-02-21 456 20 1
所以我要按每个SKU每7条记录分组,并保留每个SKU的第一个日期。一个重要的问题是实际表没有按日期或sku排序的记录
谢谢!
答案 0 :(得分:2)
以下内容适用于BigQuery Standard SQL(并且确实有效...)
#standardSQL
WITH skus AS (
SELECT sku, MIN(dt) AS start_date
FROM `project.dataset.table`
GROUP BY sku
)
SELECT
MIN(dt) begin_date,
sku,
SUM(sales) sales,
DIV(DATE_DIFF(dt, start_date, DAY) + 7, 7) week
FROM `project.dataset.table` t
JOIN skus s USING(sku)
GROUP BY sku, week
您可以使用问题中的示例数据来进行测试,如上示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-02-13' dt, '123' sku, 10 sales UNION ALL
SELECT '2019-02-14', '123', 10 UNION ALL
SELECT '2019-02-15', '123', 10 UNION ALL
SELECT '2019-02-16', '123', 10 UNION ALL
SELECT '2019-02-17', '123', 10 UNION ALL
SELECT '2019-02-18', '123', 10 UNION ALL
SELECT '2019-02-19', '123', 10 UNION ALL
SELECT '2019-02-20', '123', 10 UNION ALL
SELECT '2019-02-21', '456', 10 UNION ALL
SELECT '2019-02-22', '456', 10
), skus AS (
SELECT sku, MIN(dt) AS start_date
FROM `project.dataset.table`
GROUP BY sku
)
SELECT
MIN(dt) begin_date,
sku,
SUM(sales) sales,
DIV(DATE_DIFF(dt, start_date, DAY) + 7, 7) week
FROM `project.dataset.table` t
JOIN skus s USING(sku)
GROUP BY sku, week
-- ORDER BY sku, week
有结果
Row begin_date sku sales week
1 2019-02-13 123 70 1
2 2019-02-20 123 10 2
3 2019-02-21 456 20 1
答案 1 :(得分:1)
您可以计算每个SKU的第一天,然后使用该信息:
select date_add(mindate, interval floor(date_diff(date, mindate, day) / 7) * 7 day) as week_start
sku, sum(sales) as sales,
1 + floor(date_diff(date, mindate, day) / 7) as weeks
from (select t.*, min(date) over (partition by sku) as mindate
from t
) t
group by weeks, week_start;