Question

我有一个尴尬的情况，我有一个表存储有关值随时间变化的数据。

我有一个列groupId，它将更改组合在一起给定值。我有value，它存储了值更改的内容，并且date存储了更改发生的日期。

e.g。如果a上的值20000101和b上的20010101更改，我们可能会有以下内容：

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      42 | a     | 20000101 |
|      42 | b     | 20010101 |
+---------+-------+----------+

现在为了让事情变得有趣，我们可以提供不代表物质价值变化的记录，例如

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      43 | a     | 20000101 |
|      43 | b     | 20010101 |
|      43 | b     | 20020101 |
+---------+-------+----------+

为了获得额外的乐趣，我们可以将某个值更改为其他内容，然后再更改为之前的内容，例如：

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      44 | a     | 20000101 |
|      44 | b     | 20010101 |
|      44 | a     | 20020101 |
+---------+-------+----------+

将这些组合在一起，我们可以有一个看起来像这样的小组：

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      45 | a     | 20000101 |
|      45 | a     | 20010101 |
|      45 | b     | 20020101 |
|      45 | b     | 20030101 |
|      45 | a     | 20040101 |
|      45 | a     | 20050101 |
|      45 | b     | 20060101 |
|      45 | b     | 20070101 |
+---------+-------+----------+

我需要做的是编写一个查询，该查询将返回每个组的行，但丢弃任何这些非重大更改。对于上面的第45组，这将意味着返回：

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      45 | a     | 20000101 |
|      45 | b     | 20020101 |
|      45 | a     | 20040101 |
|      45 | b     | 20060101 |
+---------+-------+----------+

即。我们只保留每个“连续”的最早日期。 group of（groupId ,, value）。

有没有合理的方法来实现这个目标？

我在MySQL中这样做，虽然一个不依赖于它的解决方案是理想的。

Answer 1

using the method in this answer to simulate lag() in MySql:

SET @prev_value='';
select groupId, value, date
from (
  select groupId, @prev_value prev_value, @prev_value :=value value, date
  from t
  order by groupId, date
  ) a
where prev_value <> value;

rextester demo: http://rextester.com/PWF35736

returns:

+---------+-------+----------+
| groupId | value |   date   |
+---------+-------+----------+
|      45 | a     | 20000101 |
|      45 | b     | 20020101 |
|      45 | a     | 20040101 |
|      45 | b     | 20060101 |
+---------+-------+----------+

Answer 2

First, we need to build information into the table itself that tells us when a change is immaterial. In this case, we know that a record is immaterial when two identical values appear next to each other in time. We can do this by assigning a "rank" that groups immaterial and material records together. Assuming our table is called A, the following query:

select    a1.groupID
        , a1.value
        , a1.date
        , COUNT(a2.groupID) as Ranked
    from A a1
    left join A a2
        on a2.groupID = a1.groupID
        and a2.value <> a1.value
        and a2.date < a1.date
    group by  a1.groupID
            , a1.value
            , a1.date
    order by  a1.date

produces this table:

+ ------- + ----- + ---------- + ------ +
| groupId | value | date       | Ranked |
+ ------- + ----- + ---------- + ------ +
| 45      | a     | 2000-01-01 | 0      |
| 45      | a     | 2001-01-01 | 0      |
| 45      | b     | 2002-01-01 | 2      |
| 45      | b     | 2003-01-01 | 2      |
| 45      | a     | 2004-01-01 | 2      |
| 45      | a     | 2005-01-01 | 2      |
| 45      | b     | 2006-01-01 | 4      |
| 45      | b     | 2007-01-01 | 4      |
+ ------- + ----- + ---------- + ------ +

Then by grouping on groupId, value, and ranked, we can select the min(date). Since MySQL does not support CTE's, we'll just use a temporary table

create temporary table Ranking as (
    select    a1.groupID
            , a1.value
            , a1.date
            , COUNT(a2.groupID) as Ranked
        from A a1
        left join A a2
            on a2.groupID = a1.groupID
            and a2.value <> a1.value
            and a2.date < a1.date
        group by  a1.groupID
                , a1.value
                , a1.date
        order by  a1.date
)

select    groupId
        , value
        , min(date) as date
    from Ranking
    group by  groupId
            , value
            , ranked
    order by date

and voila, we get the desired result

+ ------- + ----- + ---------- +
| groupId | value | date       |
+ ------- + ----- + ---------- +
| 45      | a     | 2000-01-01 |
| 45      | b     | 2002-01-01 |
| 45      | a     | 2004-01-01 |
| 45      | b     | 2006-01-01 |
+ ------- + ----- + ---------- +

在SQL查询中选择连续组中的最小值（MySQL）

2 个答案: