按周分组,但如果是下个月则是新组

时间:2014-06-05 17:07:51

标签: mysql

我很难理解如何做到这一点。

我每天(大多数天)都有发票数据,我需要在几周内分组。然而,如果一周进入下个月,我需要桶只有当前月份的天数,然后下一个桶将从1日开始 - 下一个星期六。所以下一整周将在周日再次开始。

现在我们根本就没有对它进行分组,只是按天导出,这为滚动2年提供了大约6千万行(它比示例更复杂,因为它也按项目和客户分开) 。然后将其导入我们的需求计划软件,该软件具有每周和每月模型。在白天将它们转储到正确的桶中没有问题。

但是,由于遇到一些时间限制,我想减少这约6,000万行。但它仍然必须准确地处理数据导入的每周和每月模型。

我如何以这种方式分组?

Example Data set
+------------+------------+
| date       | sales      |
+------------+------------+
| 2014-06-22 | 100        |
| 2014-06-23 | 200        |
| 2014-06-24 | 300        |
| 2014-06-25 | 150        |
| 2014-06-26 | 170        |
| 2014-06-27 | 210        |
| 2014-06-28 | 220        |
| 2014-06-29 | 120        |
| 2014-06-30 | 110        |
| 2014-07-01 | 190        |
| 2014-07-02 | 210        |
| 2014-07-03 | 100        |
| 2014-07-04 | 140        |
| 2014-07-05 | 150        |
| 2014-07-06 | 130        |
| 2014-07-07 | 420        |
| 2014-07-08 | 310        |
| 2014-07-09 | 290        |
| 2014-07-10 | 180        |
| 2014-07-11 | 140        |
| 2014-07-12 | 210        |
+------------+------------+


Expected Result:
+------------+------------+
| date       | sum(sales) |
+------------+------------+
| 2014-06-22 | 1350       |  7 days in group
| 2014-06-29 | 230        |  2 days in group
| 2014-07-01 | 790        |  5 days in group
| 2014-07-06 | 1680       |  7 days in group
+------------+------------+

编辑:

我们提出了一个有效的解决方案。如果需要,可以随意改进它。

SELECT DATE(IF(
        MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`)
        , DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)
        , DATE_FORMAT(`date`,'%Y-%m-01')
    )) AS datekey
    , SUM(val) AS valsum

FROM tmp.testdata

GROUP BY IF(
    MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`) -- If the closest previous Sunday from date falls within the same month as the date...
    , DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY) -- ...use the date of the closest previous Sunday as the key...
    , DATE_FORMAT(`date`,'%Y-%m-01') -- ...otherwise use the 1st of the month the date falls in as the key (since that must mean the date falls in that opening partial week).
)

ORDER BY datekey

谢谢大家!我们将其中的一些结合在一起,结果是:

SELECT MIN(`date`) AS datekey
    , SUM(val) AS valsum

FROM tmp.testdata

GROUP BY DATE_FORMAT(`date`, '%U'), MONTH(`date`), YEAR(`date`) 

ORDER BY datekey

或者,如果我们总是希望桶是星期日或第1天(例如,当不是所有日子都有发票时),我们将我的解决方案与此处的解决方案相结合,因为此处的组更快

SELECT 
    DATE(IF(MONTH(DATE_SUB(`date`,
                    INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`),
            DATE_SUB(`date`,
                INTERVAL DAYOFWEEK(`date`) - 1 DAY),
            DATE_FORMAT(`date`, '%Y-%m-01'))) AS datekey,
    SUM(val) AS valsum
FROM
    tmp.testdata
GROUP BY DATE_FORMAT(`date`, '%U') , MONTH(`date`) , YEAR(`date`)
ORDER BY datekey

3 个答案:

答案 0 :(得分:1)

这是值得考虑的事情......

calendar是一个简单的日期表...

 SELECT MIN(dt),YEARWEEK(dt),MONTH(dt) FROM calendar WHERE dt BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY YEARWEEK(dt),MONTH(dt);
 +------------+--------------+-----------+
 | MIN(dt)    | YEARWEEK(dt) | MONTH(dt) |
 +------------+--------------+-----------+
 | 2014-01-01 |       201352 |         1 |
 | 2014-01-05 |       201401 |         1 |
 | 2014-01-12 |       201402 |         1 |
 | 2014-01-19 |       201403 |         1 |
 | 2014-01-26 |       201404 |         1 |<-- Overlap
 | 2014-02-01 |       201404 |         2 |<-- Overlap
 | 2014-02-02 |       201405 |         2 |
 | 2014-02-09 |       201406 |         2 |
 | 2014-02-16 |       201407 |         2 |
 | 2014-02-23 |       201408 |         2 |<-- Overlap
 | 2014-03-01 |       201408 |         3 |<-- Overlap
 | 2014-03-02 |       201409 |         3 |
 | 2014-03-09 |       201410 |         3 |
 | 2014-03-16 |       201411 |         3 |
 | 2014-03-23 |       201412 |         3 |
 | 2014-03-30 |       201413 |         3 |<-- Overlap
 | 2014-04-01 |       201413 |         4 |<-- Overlap
 | 2014-04-06 |       201414 |         4 |
 | 2014-04-13 |       201415 |         4 |
 | 2014-04-20 |       201416 |         4 |
 | 2014-04-27 |       201417 |         4 |<-- Overlap
 | 2014-05-01 |       201417 |         5 |<-- Overlap
 | 2014-05-04 |       201418 |         5 |
 | 2014-05-11 |       201419 |         5 |
 | 2014-05-18 |       201420 |         5 |
 | 2014-05-25 |       201421 |         5 |<-- No overlap
 | 2014-06-01 |       201422 |         6 |<-- No overlap
 | 2014-06-08 |       201423 |         6 |
 | 2014-06-15 |       201424 |         6 |
 | 2014-06-22 |       201425 |         6 |
 | 2014-06-29 |       201426 |         6 |<-- Overlap
 | 2014-07-01 |       201426 |         7 |<-- Overlap
 | 2014-07-06 |       201427 |         7 |
 | 2014-07-13 |       201428 |         7 |
 | 2014-07-20 |       201429 |         7 |
 | 2014-07-27 |       201430 |         7 |<-- Overlap
 | 2014-08-01 |       201430 |         8 |<-- Overlap
 | 2014-08-03 |       201431 |         8 |
 | 2014-08-10 |       201432 |         8 |
 | 2014-08-17 |       201433 |         8 |
 | 2014-08-24 |       201434 |         8 |
 | 2014-08-31 |       201435 |         8 |<-- Overlap
 | 2014-09-01 |       201435 |         9 |<-- Overlap
 | 2014-09-07 |       201436 |         9 |
 | 2014-09-14 |       201437 |         9 |
 | 2014-09-21 |       201438 |         9 |
 | 2014-09-28 |       201439 |         9 |<-- Overlap
 | 2014-10-01 |       201439 |        10 |<-- Overlap
 | 2014-10-05 |       201440 |        10 |
 | 2014-10-12 |       201441 |        10 |
 | 2014-10-19 |       201442 |        10 |
 | 2014-10-26 |       201443 |        10 |<-- Overlap
 | 2014-11-01 |       201443 |        11 |<-- Overlap
 | 2014-11-02 |       201444 |        11 |
 | 2014-11-09 |       201445 |        11 |
 | 2014-11-16 |       201446 |        11 |
 | 2014-11-23 |       201447 |        11 |
 | 2014-11-30 |       201448 |        11 |<-- Overlap
 | 2014-12-01 |       201448 |        12 |<-- Overlap
 | 2014-12-07 |       201449 |        12 |
 | 2014-12-14 |       201450 |        12 |
 | 2014-12-21 |       201451 |        12 |
 | 2014-12-28 |       201452 |        12 |
 +------------+--------------+-----------+

答案 1 :(得分:1)

SELECT min(date),sum(sales) FROM sales GROUP BY WEEKOFYEAR(date), MONTH(date);

更新:WEEKOFYEAR()将使用在星期一开始一周的MySQL日历。所以我发现你可以使用DATE_FORMAT来获取从星期日开始的周数。

SELECT min(date),sum(sales) FROM sales GROUP BY DATE_FORMAT(date, '%U'), MONTH(date);

答案 2 :(得分:0)

我们想出了一个有效的解决方案。

SELECT DATE(IF(
        MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`)
        , DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)
        , DATE_FORMAT(`date`,'%Y-%m-01')
    )) AS datekey
    , SUM(val) AS valsum

FROM tmp.testdata

GROUP BY IF(
    MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`) -- If the closest previous Sunday from date falls within the same month as the date...
    , DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY) -- ...use the date of the closest previous Sunday as the key...
    , DATE_FORMAT(`date`,'%Y-%m-01') -- ...otherwise use the 1st of the month the date falls in as the key (since that must mean the date falls in that opening partial week).
)

ORDER BY datekey