在SQL查询中选择连续组中的最小值(MySQL)

时间:2017-04-06 16:04:11

标签: mysql sql

我有一个尴尬的情况,我有一个表存储有关值随时间变化的数据。

我有一个列groupId,它将更改组合在一起给定值。我有value,它存储了值更改的内容,并且date存储了更改发生的日期。

e.g。如果a上的值20000101b上的20010101更改,我们可能会有以下内容:

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      42 | a     | 20000101 |
|      42 | b     | 20010101 |
+---------+-------+----------+

现在为了让事情变得有趣,我们可以提供不代表物质价值变化的记录,例如

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      43 | a     | 20000101 |
|      43 | b     | 20010101 |
|      43 | b     | 20020101 |
+---------+-------+----------+

为了获得额外的乐趣,我们可以将某个值更改为其他内容,然后再更改为之前的内容,例如:

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      44 | a     | 20000101 |
|      44 | b     | 20010101 |
|      44 | a     | 20020101 |
+---------+-------+----------+

将这些组合在一起,我们可以有一个看起来像这样的小组:

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      45 | a     | 20000101 |
|      45 | a     | 20010101 |
|      45 | b     | 20020101 |
|      45 | b     | 20030101 |
|      45 | a     | 20040101 |
|      45 | a     | 20050101 |
|      45 | b     | 20060101 |
|      45 | b     | 20070101 |
+---------+-------+----------+

我需要做的是编写一个查询,该查询将返回每个组的行,但丢弃任何这些非重大更改。对于上面的第45组,这将意味着返回:

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      45 | a     | 20000101 |
|      45 | b     | 20020101 |
|      45 | a     | 20040101 |
|      45 | b     | 20060101 |
+---------+-------+----------+

即。我们只保留每个“连续”的最早日期。 group of(groupId ,, value)。

有没有合理的方法来实现这个目标?

我在MySQL中这样做,虽然一个不依赖于它的解决方案是理想的。

2 个答案:

答案 0 :(得分:0)

using the method in this answer to simulate lag() in MySql:

SET @prev_value='';
select groupId, value, date
from (
  select groupId, @prev_value prev_value, @prev_value :=value value, date
  from t
  order by groupId, date
  ) a
where prev_value <> value;

rextester demo: http://rextester.com/PWF35736

returns:

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      45 | a     | 20000101 |
|      45 | b     | 20020101 |
|      45 | a     | 20040101 |
|      45 | b     | 20060101 |
+---------+-------+----------+

答案 1 :(得分:0)

First, we need to build information into the table itself that tells us when a change is immaterial. In this case, we know that a record is immaterial when two identical values appear next to each other in time. We can do this by assigning a "rank" that groups immaterial and material records together. Assuming our table is called A, the following query:

select    a1.groupID
        , a1.value
        , a1.date
        , COUNT(a2.groupID) as Ranked
    from A a1
    left join A a2
        on a2.groupID = a1.groupID
        and a2.value <> a1.value
        and a2.date < a1.date
    group by  a1.groupID
            , a1.value
            , a1.date
    order by  a1.date

produces this table:

+ ------- + ----- + ---------- + ------ +
| groupId | value | date       | Ranked |
+ ------- + ----- + ---------- + ------ +
| 45      | a     | 2000-01-01 | 0      |
| 45      | a     | 2001-01-01 | 0      |
| 45      | b     | 2002-01-01 | 2      |
| 45      | b     | 2003-01-01 | 2      |
| 45      | a     | 2004-01-01 | 2      |
| 45      | a     | 2005-01-01 | 2      |
| 45      | b     | 2006-01-01 | 4      |
| 45      | b     | 2007-01-01 | 4      |
+ ------- + ----- + ---------- + ------ +

Then by grouping on groupId, value, and ranked, we can select the min(date). Since MySQL does not support CTE's, we'll just use a temporary table

create temporary table Ranking as (
    select    a1.groupID
            , a1.value
            , a1.date
            , COUNT(a2.groupID) as Ranked
        from A a1
        left join A a2
            on a2.groupID = a1.groupID
            and a2.value <> a1.value
            and a2.date < a1.date
        group by  a1.groupID
                , a1.value
                , a1.date
        order by  a1.date
)

select    groupId
        , value
        , min(date) as date
    from Ranking
    group by  groupId
            , value
            , ranked
    order by date

and voila, we get the desired result

+ ------- + ----- + ---------- +
| groupId | value | date       |
+ ------- + ----- + ---------- +
| 45      | a     | 2000-01-01 |
| 45      | b     | 2002-01-01 |
| 45      | a     | 2004-01-01 |
| 45      | b     | 2006-01-01 |
+ ------- + ----- + ---------- +