+------------+------+
| 2011-03-04 | 6 |
| 2011-03-01 | 1 |
| 2011-02-28 | 4 |
| 2011-02-24 | 1 |
| 2011-02-23 | 1 |
| 2011-02-22 | 2 |
| 2011-02-17 | 1 |
| 2011-02-16 | 22 |
| 2011-02-12 | 2033 |
| 2011-02-10 | 1 |
| 2011-02-07 | 1 |
| 2011-01-04 | 1 |
| 2011-01-03 | 5 |
| 2010-12-26 | 6 |
| 2010-12-16 | 1 |
| 2010-12-15 | 158 |
| 2010-12-14 | 1703 |
| 2010-12-13 | 199 |
| 2010-11-08 | 1 |
| 2010-10-28 | 3 |
| 2010-10-27 | 6 |
| 2010-10-25 | 1 |
| 2010-10-21 | 660 |
| 2010-10-20 | 558 |
| 2010-10-19 | 245 |
| 2010-10-18 | 579 |
| 2010-10-15 | 14 |
| 2010-10-14 | 1 |
| 2010-10-04 | 1 |
| 2010-09-08 | 1 |
| 2010-09-01 | 1 |
| 2010-08-31 | 1 |
| 2010-08-30 | 6 |
| 2010-08-26 | 1 |
| 2010-08-24 | 4 |
| 2010-08-23 | 2 |
| 2010-08-19 | 3 |
| 2010-08-18 | 144 |
| 2010-08-17 | 920 |
| 2010-08-16 | 982 |
| 2010-08-03 | 1 |
| 2010-08-02 | 1 |
| 2010-07-12 | 1 |
| 2010-06-30 | 8 |
| 2010-06-29 | 1 |
| 2010-06-28 | 1 |
| 2010-06-23 | 1 |
| 2010-06-22 | 1 |
| 2010-06-17 | 7 |
| 2010-06-16 | 703 |
| 2010-06-15 | 937 |
| 2010-06-14 | 397 |
| 2010-06-10 | 2 |
| 2010-06-09 | 1 |
| 2010-06-01 | 5 |
| 2010-05-26 | 1 |
| 2010-05-05 | 1 |
| 2010-04-27 | 2 |
| 2010-04-26 | 4 |
| 2010-04-24 | 6 |
| 2010-04-22 | 2 |
| 2010-04-21 | 351 |
| 2010-04-20 | 839 |
| 2010-04-19 | 850 |
| 2010-04-18 | 2 |
| 2010-04-15 | 2 |
| 2010-04-07 | 1 |
| 2010-04-01 | 2 |
| 2010-03-30 | 1 |
| 2010-03-22 | 1 |
| 2010-03-10 | 1 |
| 2010-03-08 | 1 |
| 2010-03-04 | 1 |
| 2010-03-01 | 3 |
| 2010-02-27 | 6 |
| 2010-02-25 | 2 |
| 2010-02-23 | 4 |
| 2010-02-22 | 1 |
| 2010-02-18 | 188 |
| 2010-02-17 | 1210 |
| 2010-02-16 | 646 |
| 2010-01-27 | 1 |
| 2010-01-21 | 1 |
| 2010-01-07 | 1 |
| 2010-01-06 | 2 |
| 2010-01-04 | 12 |
+------------+------+
我有这个数据集在过去几年。我想将类似的阅读日期归为一类。喜欢参加范围2011-02-07和2011-03-04并将它们组合在一起作为读数:当年的1。
或者将2010-10-04和2010-10-28结合起来作为读数:当年的第5位。
基于第二列,基于读数的分组是相似的。需要将尖峰组合在一起。它将是每年6个时期,并且它们之间至少有40天的差异。
我怎样才能在MySQL中做到这一点?
答案 0 :(得分:2)
我把你的样本数据扔进了一个简单的表格中:
CREATE TABLE `usage_bill` (
`readdate` date default NULL,
`reading` int(11) default NULL
);
我已经能够以通用的方式检测峰值:
SET @seq1 := 0;
SET @seq2 := 0;
SET @lastdiff := 0;
SELECT readdate, reading FROM (
SELECT ref2.readdate, ref1.reading, ref2.reading - ref1.reading AS diff,
(@lastdiff>0) && (ref2.reading - ref1.reading)<0 AS peak,
@lastdiff := ref2.reading - ref1.reading AS lastdiff FROM
(SELECT @seq1 := @seq1 + 1 AS rowNum, readdate, reading FROM usage_bill ORDER BY readdate) AS ref1,
(SELECT @seq2 := @seq2 + 1 AS rowNum, readdate, reading FROM usage_bill ORDER BY readdate) AS ref2
WHERE ref1.rowNum+1 = ref2.rowNum ) AS peaks
WHERE peak=1;
理论上应该可以添加ORDER BY reading DESC LIMIT 6
以获得最大的峰值,但实际上并非所有的峰值都不是干净的曲线(例如2010年10月)。
不确定这对你有帮助吗......
答案 1 :(得分:0)
我已经设法通过使用用户变量使其工作,但它有一点点问题。
SET @stepdate:=DATE('1970-01-01');
SET @reading:=0;
SET @prevdate:=DATE('1970-01-01');
SELECT readdate,
IF(( To_days(`readdate`) - To_days(@stepdate) ) > 40
OR (SELECT COUNT(*)
FROM usage_bill
WHERE readdate BETWEEN d.readdate AND DATE_ADD(d.readdate,
INTERVAL 25 DAY
) >
100), @reading := @reading + 1, @reading) AS rownum,
IF(( To_days(`readdate`) - To_days(@stepdate) ) > 40
OR (SELECT COUNT(*)
FROM usage_bill
WHERE readdate BETWEEN d.readdate AND DATE_ADD(d.readdate,
INTERVAL 25 DAY
) >
100), @stepdate := readdate, @stepdate) AS dd,
IF(YEAR(@stepdate)<>YEAR(@prevdate), @reading := 1 XOR
@prevdate := @stepdate,
@reading),
@reading AS nthgroup
FROM (SELECT `readdate`,
COUNT(*) c
FROM `usage_bill`
GROUP BY `readdate`
HAVING COUNT(*) > 20
ORDER BY `usage_bill`.`readdate` ASC) d
问题在于,当我删除HAVING COUNT(*) > 20
时,我会在阅读日期之间错过阅读日期并撰写新组。