我在选择查询中删除重复项时遇到问题,同时仍在考虑行的顺序。 我有以下示例数据:
myDate myValue
---------------------------
2014-01-01 100
2014-01-02 100
2014-01-03 200
2014-01-04 100
2014-01-05 100
2014-01-06 100
2014-01-07 300
我需要一个能够删除以下日期的重复项的查询。因此,产生以下结果,请注意在结果中返回值100不止一次,这与我当前的查询不同。
myDate myValue
---------------------------
2014-01-01 100
2014-01-03 200
2014-01-04 100
2014-01-07 300
到目前为止,我所做的是:
SELECT * FROM (
SELECT myDate, myValue
FROM testtable
ORDER BY myDate
) AS t_temp GROUP BY myValue;
关于如何改进查询以产生所需结果的任何想法?
答案 0 :(得分:1)
我还没有对此进行验证,但我认为这会为您提供所需的信息。内部查询抓取每一行,其中当前值与前一行不匹配。它使用@previous
来跟踪上一行。否则,它会生成NULL
行。最后,外部查询消除了NULL
行。
例如,当它查看第一行时,它会看到myValue
与@previous
不匹配,因为它是空的,并且它抓住整行。当它查看第二行时,它会看到myValue
等于@previous
,因此在这种情况下,它会生成NULL。当它查看第三行时,它会看到myValue
不等于100
,因此它会抓取整行。它最终做到了这一点。然后外部查询消除所有NULL
行。
SET @previous := '';
SELECT
myDate,
myValue
FROM (
SELECT
IF( myValue != @previous, myDate, NULL ) AS myDate,
IF( myValue != @previous, myValue, NULL ) AS myValue,
@previous := myValue
FROM testtable
) temp
WHERE myDate IS NOT NULL;
这也可以写成如下:
SELECT
myDate,
myValue
FROM (
SELECT
IF( myValue != @previous, myDate, NULL ) AS myDate,
IF( myValue != @previous, myValue, NULL ) AS myValue,
@previous := myValue
FROM my_table
, (SELECT @previous := '') val
ORDER
BY myDate
) temp
WHERE myDate IS NOT NULL;
答案 1 :(得分:0)
在SQL中,您可以使用LAG或LEAD查看上一条或下一条记录,但MySQL并不支持它们。
因此,如果每天都有一个条目,您可以选择前一天并与当前值进行比较:
select
mytable.mydate,
mytable.myvalue
from mytable
left outer join mytable prev on adddate(prev.mydate, interval 1 day) = mytable.mydate
where prev.myvalue is null or prev.myvalue != mytable.myvalue
order by mydate;
如果存在间隙,则必须选择所有早期记录并找到其中的最小日期以获取前一个记录。
答案 2 :(得分:0)
这是一种方式(注意;我的数据集与您的数据集略有不同)......
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(myDate DATE NOT NULL PRIMARY KEY
,myValue INT NOT NULL
);
INSERT INTO my_table VALUES
('2014-01-01',100),
('2014-01-02',100),
('2014-01-03',200),
('2014-01-04',100),
('2014-01-07',100),
('2014-01-08',100),
('2014-01-09',300);
SELECT * FROM my_table;
+------------+---------+
| myDate | myValue |
+------------+---------+
| 2014-01-01 | 100 |
| 2014-01-02 | 100 |
| 2014-01-03 | 200 |
| 2014-01-04 | 100 |
| 2014-01-07 | 100 |
| 2014-01-08 | 100 |
| 2014-01-09 | 300 |
+------------+---------+
SELECT a.myDate
, a.myValue
FROM (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) a
LEFT
JOIN (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) b
ON b.myValue = a.myValue
AND b.rank = a.rank - 1
LEFT
JOIN (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) c
ON c.myValue = a.myValue
AND c.rank >= a.rank
LEFT
JOIN (SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.myDate <= x.myDate GROUP BY x.myDate) d
ON d.myValue = a.myValue
AND d.rank = c.rank + 1
WHERE b.rank IS NULL
AND c.rank IS NOT NULL
AND d.rank IS NULL
GROUP
BY a.rank;
+------------+---------+
| myDate | myValue |
+------------+---------+
| 2014-01-01 | 100 |
| 2014-01-03 | 200 |
| 2014-01-04 | 100 |
| 2014-01-09 | 300 |
+------------+---------+
如果这些日子确实是连续的,没有间隙,那么这可以简化。
答案 3 :(得分:-2)
使用min()
汇总功能,以便始终获得每个myValue
分组的最低日期:
SELECT MIN(myDate), myValue
FROM testtable
GROUP BY myValue
ORDER BY myValue