我很难理解如何做到这一点。
我每天(大多数天)都有发票数据,我需要在几周内分组。然而,如果一周进入下个月,我需要桶只有当前月份的天数,然后下一个桶将从1日开始 - 下一个星期六。所以下一整周将在周日再次开始。
现在我们根本就没有对它进行分组,只是按天导出,这为滚动2年提供了大约6千万行(它比示例更复杂,因为它也按项目和客户分开) 。然后将其导入我们的需求计划软件,该软件具有每周和每月模型。在白天将它们转储到正确的桶中没有问题。
但是,由于遇到一些时间限制,我想减少这约6,000万行。但它仍然必须准确地处理数据导入的每周和每月模型。
我如何以这种方式分组?
Example Data set
+------------+------------+
| date | sales |
+------------+------------+
| 2014-06-22 | 100 |
| 2014-06-23 | 200 |
| 2014-06-24 | 300 |
| 2014-06-25 | 150 |
| 2014-06-26 | 170 |
| 2014-06-27 | 210 |
| 2014-06-28 | 220 |
| 2014-06-29 | 120 |
| 2014-06-30 | 110 |
| 2014-07-01 | 190 |
| 2014-07-02 | 210 |
| 2014-07-03 | 100 |
| 2014-07-04 | 140 |
| 2014-07-05 | 150 |
| 2014-07-06 | 130 |
| 2014-07-07 | 420 |
| 2014-07-08 | 310 |
| 2014-07-09 | 290 |
| 2014-07-10 | 180 |
| 2014-07-11 | 140 |
| 2014-07-12 | 210 |
+------------+------------+
Expected Result:
+------------+------------+
| date | sum(sales) |
+------------+------------+
| 2014-06-22 | 1350 | 7 days in group
| 2014-06-29 | 230 | 2 days in group
| 2014-07-01 | 790 | 5 days in group
| 2014-07-06 | 1680 | 7 days in group
+------------+------------+
编辑:
我们提出了一个有效的解决方案。如果需要,可以随意改进它。
SELECT DATE(IF(
MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`)
, DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)
, DATE_FORMAT(`date`,'%Y-%m-01')
)) AS datekey
, SUM(val) AS valsum
FROM tmp.testdata
GROUP BY IF(
MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`) -- If the closest previous Sunday from date falls within the same month as the date...
, DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY) -- ...use the date of the closest previous Sunday as the key...
, DATE_FORMAT(`date`,'%Y-%m-01') -- ...otherwise use the 1st of the month the date falls in as the key (since that must mean the date falls in that opening partial week).
)
ORDER BY datekey
谢谢大家!我们将其中的一些结合在一起,结果是:
SELECT MIN(`date`) AS datekey
, SUM(val) AS valsum
FROM tmp.testdata
GROUP BY DATE_FORMAT(`date`, '%U'), MONTH(`date`), YEAR(`date`)
ORDER BY datekey
或者,如果我们总是希望桶是星期日或第1天(例如,当不是所有日子都有发票时),我们将我的解决方案与此处的解决方案相结合,因为此处的组更快
SELECT
DATE(IF(MONTH(DATE_SUB(`date`,
INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`),
DATE_SUB(`date`,
INTERVAL DAYOFWEEK(`date`) - 1 DAY),
DATE_FORMAT(`date`, '%Y-%m-01'))) AS datekey,
SUM(val) AS valsum
FROM
tmp.testdata
GROUP BY DATE_FORMAT(`date`, '%U') , MONTH(`date`) , YEAR(`date`)
ORDER BY datekey
答案 0 :(得分:1)
这是值得考虑的事情......
calendar
是一个简单的日期表...
SELECT MIN(dt),YEARWEEK(dt),MONTH(dt) FROM calendar WHERE dt BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY YEARWEEK(dt),MONTH(dt);
+------------+--------------+-----------+
| MIN(dt) | YEARWEEK(dt) | MONTH(dt) |
+------------+--------------+-----------+
| 2014-01-01 | 201352 | 1 |
| 2014-01-05 | 201401 | 1 |
| 2014-01-12 | 201402 | 1 |
| 2014-01-19 | 201403 | 1 |
| 2014-01-26 | 201404 | 1 |<-- Overlap
| 2014-02-01 | 201404 | 2 |<-- Overlap
| 2014-02-02 | 201405 | 2 |
| 2014-02-09 | 201406 | 2 |
| 2014-02-16 | 201407 | 2 |
| 2014-02-23 | 201408 | 2 |<-- Overlap
| 2014-03-01 | 201408 | 3 |<-- Overlap
| 2014-03-02 | 201409 | 3 |
| 2014-03-09 | 201410 | 3 |
| 2014-03-16 | 201411 | 3 |
| 2014-03-23 | 201412 | 3 |
| 2014-03-30 | 201413 | 3 |<-- Overlap
| 2014-04-01 | 201413 | 4 |<-- Overlap
| 2014-04-06 | 201414 | 4 |
| 2014-04-13 | 201415 | 4 |
| 2014-04-20 | 201416 | 4 |
| 2014-04-27 | 201417 | 4 |<-- Overlap
| 2014-05-01 | 201417 | 5 |<-- Overlap
| 2014-05-04 | 201418 | 5 |
| 2014-05-11 | 201419 | 5 |
| 2014-05-18 | 201420 | 5 |
| 2014-05-25 | 201421 | 5 |<-- No overlap
| 2014-06-01 | 201422 | 6 |<-- No overlap
| 2014-06-08 | 201423 | 6 |
| 2014-06-15 | 201424 | 6 |
| 2014-06-22 | 201425 | 6 |
| 2014-06-29 | 201426 | 6 |<-- Overlap
| 2014-07-01 | 201426 | 7 |<-- Overlap
| 2014-07-06 | 201427 | 7 |
| 2014-07-13 | 201428 | 7 |
| 2014-07-20 | 201429 | 7 |
| 2014-07-27 | 201430 | 7 |<-- Overlap
| 2014-08-01 | 201430 | 8 |<-- Overlap
| 2014-08-03 | 201431 | 8 |
| 2014-08-10 | 201432 | 8 |
| 2014-08-17 | 201433 | 8 |
| 2014-08-24 | 201434 | 8 |
| 2014-08-31 | 201435 | 8 |<-- Overlap
| 2014-09-01 | 201435 | 9 |<-- Overlap
| 2014-09-07 | 201436 | 9 |
| 2014-09-14 | 201437 | 9 |
| 2014-09-21 | 201438 | 9 |
| 2014-09-28 | 201439 | 9 |<-- Overlap
| 2014-10-01 | 201439 | 10 |<-- Overlap
| 2014-10-05 | 201440 | 10 |
| 2014-10-12 | 201441 | 10 |
| 2014-10-19 | 201442 | 10 |
| 2014-10-26 | 201443 | 10 |<-- Overlap
| 2014-11-01 | 201443 | 11 |<-- Overlap
| 2014-11-02 | 201444 | 11 |
| 2014-11-09 | 201445 | 11 |
| 2014-11-16 | 201446 | 11 |
| 2014-11-23 | 201447 | 11 |
| 2014-11-30 | 201448 | 11 |<-- Overlap
| 2014-12-01 | 201448 | 12 |<-- Overlap
| 2014-12-07 | 201449 | 12 |
| 2014-12-14 | 201450 | 12 |
| 2014-12-21 | 201451 | 12 |
| 2014-12-28 | 201452 | 12 |
+------------+--------------+-----------+
答案 1 :(得分:1)
SELECT min(date),sum(sales) FROM sales GROUP BY WEEKOFYEAR(date), MONTH(date);
更新:WEEKOFYEAR()将使用在星期一开始一周的MySQL日历。所以我发现你可以使用DATE_FORMAT来获取从星期日开始的周数。
SELECT min(date),sum(sales) FROM sales GROUP BY DATE_FORMAT(date, '%U'), MONTH(date);
答案 2 :(得分:0)
我们想出了一个有效的解决方案。
SELECT DATE(IF(
MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`)
, DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)
, DATE_FORMAT(`date`,'%Y-%m-01')
)) AS datekey
, SUM(val) AS valsum
FROM tmp.testdata
GROUP BY IF(
MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`) -- If the closest previous Sunday from date falls within the same month as the date...
, DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY) -- ...use the date of the closest previous Sunday as the key...
, DATE_FORMAT(`date`,'%Y-%m-01') -- ...otherwise use the 1st of the month the date falls in as the key (since that must mean the date falls in that opening partial week).
)
ORDER BY datekey