将几个阅读日期组合成一组,即当年的第N次阅读

时间:2011-05-09 13:46:03

标签: mysql sql stored-procedures

+------------+------+
| 2011-03-04 |    6 |
| 2011-03-01 |    1 |
| 2011-02-28 |    4 |
| 2011-02-24 |    1 |
| 2011-02-23 |    1 |
| 2011-02-22 |    2 |
| 2011-02-17 |    1 |
| 2011-02-16 |   22 |
| 2011-02-12 | 2033 |
| 2011-02-10 |    1 |
| 2011-02-07 |    1 |
| 2011-01-04 |    1 |
| 2011-01-03 |    5 |
| 2010-12-26 |    6 |
| 2010-12-16 |    1 |
| 2010-12-15 |  158 |
| 2010-12-14 | 1703 |
| 2010-12-13 |  199 |
| 2010-11-08 |    1 |
| 2010-10-28 |    3 |
| 2010-10-27 |    6 |
| 2010-10-25 |    1 |
| 2010-10-21 |  660 |
| 2010-10-20 |  558 |
| 2010-10-19 |  245 |
| 2010-10-18 |  579 |
| 2010-10-15 |   14 |
| 2010-10-14 |    1 |
| 2010-10-04 |    1 |
| 2010-09-08 |    1 |
| 2010-09-01 |    1 |
| 2010-08-31 |    1 |
| 2010-08-30 |    6 |
| 2010-08-26 |    1 |
| 2010-08-24 |    4 |
| 2010-08-23 |    2 |
| 2010-08-19 |    3 |
| 2010-08-18 |  144 |
| 2010-08-17 |  920 |
| 2010-08-16 |  982 |
| 2010-08-03 |    1 |
| 2010-08-02 |    1 |
| 2010-07-12 |    1 |
| 2010-06-30 |    8 |
| 2010-06-29 |    1 |
| 2010-06-28 |    1 |
| 2010-06-23 |    1 |
| 2010-06-22 |    1 |
| 2010-06-17 |    7 |
| 2010-06-16 |  703 |
| 2010-06-15 |  937 |
| 2010-06-14 |  397 |
| 2010-06-10 |    2 |
| 2010-06-09 |    1 |
| 2010-06-01 |    5 |
| 2010-05-26 |    1 |
| 2010-05-05 |    1 |
| 2010-04-27 |    2 |
| 2010-04-26 |    4 |
| 2010-04-24 |    6 |
| 2010-04-22 |    2 |
| 2010-04-21 |  351 |
| 2010-04-20 |  839 |
| 2010-04-19 |  850 |
| 2010-04-18 |    2 |
| 2010-04-15 |    2 |
| 2010-04-07 |    1 |
| 2010-04-01 |    2 |
| 2010-03-30 |    1 |
| 2010-03-22 |    1 |
| 2010-03-10 |    1 |
| 2010-03-08 |    1 |
| 2010-03-04 |    1 |
| 2010-03-01 |    3 |
| 2010-02-27 |    6 |
| 2010-02-25 |    2 |
| 2010-02-23 |    4 |
| 2010-02-22 |    1 |
| 2010-02-18 |  188 |
| 2010-02-17 | 1210 |
| 2010-02-16 |  646 |
| 2010-01-27 |    1 |
| 2010-01-21 |    1 |
| 2010-01-07 |    1 |
| 2010-01-06 |    2 |
| 2010-01-04 |   12 |
+------------+------+

我有这个数据集在过去几年。我想将类似的阅读日期归为一类。喜欢参加范围2011-02-07和2011-03-04并将它们组合在一起作为读数:当年的1。

或者将2010-10-04和2010-10-28结合起来作为读数:当年的第5位。

基于第二列,基于读数的分组是相似的。需要将尖峰组合在一起。它将是每年6个时期,并且它们之间至少有40天的差异。

我怎样才能在MySQL中做到这一点?

2 个答案:

答案 0 :(得分:2)

我把你的样本数据扔进了一个简单的表格中:

CREATE TABLE `usage_bill` (
  `readdate` date default NULL,
  `reading` int(11) default NULL
);

我已经能够以通用的方式检测峰值:

SET @seq1 := 0;
SET @seq2 := 0;

SET @lastdiff := 0;

SELECT readdate, reading FROM  (
    SELECT ref2.readdate, ref1.reading, ref2.reading - ref1.reading AS diff,
        (@lastdiff>0) && (ref2.reading - ref1.reading)<0 AS peak,
            @lastdiff := ref2.reading - ref1.reading AS lastdiff FROM
        (SELECT @seq1 := @seq1 + 1 AS rowNum, readdate, reading FROM usage_bill ORDER BY readdate) AS ref1,
        (SELECT @seq2 := @seq2 + 1 AS rowNum, readdate, reading FROM usage_bill ORDER BY readdate) AS ref2
        WHERE ref1.rowNum+1 = ref2.rowNum ) AS peaks
WHERE peak=1;

理论上应该可以添加ORDER BY reading DESC LIMIT 6以获得最大的峰值,但实际上并非所有的峰值都不是干净的曲线(例如2010年10月)。

不确定这对你有帮助吗......

答案 1 :(得分:0)

我已经设法通过使用用户变量使其工作,但它有一点点问题。

SET @stepdate:=DATE('1970-01-01'); 

SET @reading:=0; 

SET @prevdate:=DATE('1970-01-01'); 

SELECT readdate, 
       IF(( To_days(`readdate`) - To_days(@stepdate) ) > 40 
           OR (SELECT COUNT(*) 
               FROM   usage_bill 
               WHERE  readdate BETWEEN d.readdate AND DATE_ADD(d.readdate, 
                                                      INTERVAL 25 DAY 
                                                      ) > 
                      100), @reading := @reading + 1, @reading) AS rownum, 
       IF(( To_days(`readdate`) - To_days(@stepdate) ) > 40 
           OR (SELECT COUNT(*) 
               FROM   usage_bill 
               WHERE  readdate BETWEEN d.readdate AND DATE_ADD(d.readdate, 
                                                      INTERVAL 25 DAY 
                                                      ) > 
                      100), @stepdate := readdate, @stepdate)   AS dd, 
       IF(YEAR(@stepdate)<>YEAR(@prevdate), @reading := 1 XOR 
                                            @prevdate := @stepdate, 
       @reading), 
       @reading                                                 AS nthgroup 
FROM   (SELECT `readdate`, 
               COUNT(*) c 
        FROM   `usage_bill` 
        GROUP  BY `readdate` 
        HAVING COUNT(*) > 20 
        ORDER  BY `usage_bill`.`readdate` ASC) d 

问题在于,当我删除HAVING COUNT(*) > 20时,我会在阅读日期之间错过阅读日期并撰写新组。