MySQL" group by"维护数据分类

时间:2014-05-07 14:14:00

标签: mysql sql group-by

我在选择查询中删除重复项时遇到问题,同时仍在考虑行的顺序。 我有以下示例数据:

myDate      myValue
---------------------------
2014-01-01  100
2014-01-02  100
2014-01-03  200
2014-01-04  100
2014-01-05  100
2014-01-06  100
2014-01-07  300

我需要一个能够删除以下日期的重复项的查询。因此,产生以下结果,请注意在结果中返回值100不止一次,这与我当前的查询不同。

myDate      myValue
---------------------------
2014-01-01  100
2014-01-03  200
2014-01-04  100
2014-01-07  300

到目前为止,我所做的是:

SELECT * FROM (
   SELECT myDate, myValue
   FROM testtable
   ORDER BY myDate
) AS t_temp GROUP BY myValue;

关于如何改进查询以产生所需结果的任何想法?

4 个答案:

答案 0 :(得分:1)

我还没有对此进行验证,但我认为这会为您提供所需的信息。内部查询抓取每一行,其中当前值与前一行不匹配。它使用@previous来跟踪上一行。否则,它会生成NULL行。最后,外部查询消除了NULL行。

例如,当它查看第一行时,它会看到myValue@previous不匹配,因为它是空的,并且它抓住整行。当它查看第二行时,它会看到myValue等于@previous,因此在这种情况下,它会生成NULL。当它查看第三行时,它会看到myValue不等于100,因此它会抓取整行。它最终做到了这一点。然后外部查询消除所有NULL行。

SET @previous := '';

SELECT
    myDate,
    myValue
FROM (
    SELECT
        IF( myValue != @previous, myDate, NULL ) AS myDate,
        IF( myValue != @previous, myValue, NULL ) AS myValue,
        @previous := myValue
    FROM testtable
) temp
WHERE myDate IS NOT NULL;

这也可以写成如下:

SELECT
    myDate,
    myValue
FROM (
    SELECT
        IF( myValue != @previous, myDate, NULL ) AS myDate,
        IF( myValue != @previous, myValue, NULL ) AS myValue,
        @previous := myValue
    FROM my_table
       , (SELECT @previous := '') val
   ORDER
      BY myDate
) temp
WHERE myDate IS NOT NULL;

答案 1 :(得分:0)

在SQL中,您可以使用LAG或LEAD查看上一条或下一条记录,但MySQL并不支持它们。

因此,如果每天都有一个条目,您可以选择前一天并与当前值进行比较:

select 
  mytable.mydate, 
  mytable.myvalue
from mytable 
left outer join mytable prev on adddate(prev.mydate, interval 1 day) = mytable.mydate
where prev.myvalue is null or prev.myvalue != mytable.myvalue
order by mydate;

如果存在间隙,则必须选择所有早期记录并找到其中的最小日期以获取前一个记录。

答案 2 :(得分:0)

这是一种方式(注意;我的数据集与您的数据集略有不同)......

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table 
(myDate      DATE NOT NULL PRIMARY KEY
,myValue INT NOT NULL
);

INSERT INTO my_table VALUES
('2014-01-01',100),
('2014-01-02',100),
('2014-01-03',200),
('2014-01-04',100),
('2014-01-07',100),
('2014-01-08',100),
('2014-01-09',300);

SELECT * FROM my_table;
+------------+---------+
| myDate     | myValue |
+------------+---------+
| 2014-01-01 |     100 |
| 2014-01-02 |     100 |
| 2014-01-03 |     200 |
| 2014-01-04 |     100 |
| 2014-01-07 |     100 |
| 2014-01-08 |     100 |
| 2014-01-09 |     300 |
+------------+---------+

SELECT a.myDate 
     , a.myValue
  FROM (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) a
  LEFT 
  JOIN (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) b 
    ON b.myValue = a.myValue
   AND b.rank = a.rank - 1
  LEFT 
  JOIN (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) c 
    ON c.myValue = a.myValue
   AND c.rank >= a.rank
  LEFT 
  JOIN (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) d 
    ON d.myValue = a.myValue
   AND d.rank = c.rank + 1 
 WHERE b.rank IS NULL 
   AND c.rank IS NOT NULL
   AND d.rank IS NULL
 GROUP 
    BY a.rank;

+------------+---------+
| myDate     | myValue |
+------------+---------+
| 2014-01-01 |     100 |
| 2014-01-03 |     200 |
| 2014-01-04 |     100 |
| 2014-01-09 |     300 |
+------------+---------+

如果这些日子确实是连续的,没有间隙,那么这可以简化。

答案 3 :(得分:-2)

使用min()汇总功能,以便始终获得每个myValue分组的最低日期:

SELECT MIN(myDate), myValue
FROM testtable
GROUP BY myValue
ORDER BY myValue