我有一个尴尬的情况,我有一个表存储有关值随时间变化的数据。
我有一个列groupId
,它将更改组合在一起给定值。我有value
,它存储了值更改的内容,并且date
存储了更改发生的日期。
e.g。如果a
上的值20000101
和b
上的20010101
更改,我们可能会有以下内容:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 42 | a | 20000101 |
| 42 | b | 20010101 |
+---------+-------+----------+
现在为了让事情变得有趣,我们可以提供不代表物质价值变化的记录,例如
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 43 | a | 20000101 |
| 43 | b | 20010101 |
| 43 | b | 20020101 |
+---------+-------+----------+
为了获得额外的乐趣,我们可以将某个值更改为其他内容,然后再更改为之前的内容,例如:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 44 | a | 20000101 |
| 44 | b | 20010101 |
| 44 | a | 20020101 |
+---------+-------+----------+
将这些组合在一起,我们可以有一个看起来像这样的小组:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 45 | a | 20000101 |
| 45 | a | 20010101 |
| 45 | b | 20020101 |
| 45 | b | 20030101 |
| 45 | a | 20040101 |
| 45 | a | 20050101 |
| 45 | b | 20060101 |
| 45 | b | 20070101 |
+---------+-------+----------+
我需要做的是编写一个查询,该查询将返回每个组的行,但丢弃任何这些非重大更改。对于上面的第45组,这将意味着返回:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 45 | a | 20000101 |
| 45 | b | 20020101 |
| 45 | a | 20040101 |
| 45 | b | 20060101 |
+---------+-------+----------+
即。我们只保留每个“连续”的最早日期。 group of(groupId ,, value)。
有没有合理的方法来实现这个目标?
我在MySQL中这样做,虽然一个不依赖于它的解决方案是理想的。
答案 0 :(得分:0)
using the method in this answer to simulate lag()
in MySql:
SET @prev_value='';
select groupId, value, date
from (
select groupId, @prev_value prev_value, @prev_value :=value value, date
from t
order by groupId, date
) a
where prev_value <> value;
rextester demo: http://rextester.com/PWF35736
returns:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 45 | a | 20000101 |
| 45 | b | 20020101 |
| 45 | a | 20040101 |
| 45 | b | 20060101 |
+---------+-------+----------+
答案 1 :(得分:0)
First, we need to build information into the table itself that tells us when a change is immaterial. In this case, we know that a record is immaterial when two identical values appear next to each other in time. We can do this by assigning a "rank" that groups immaterial and material records together. Assuming our table is called A
, the following query:
select a1.groupID
, a1.value
, a1.date
, COUNT(a2.groupID) as Ranked
from A a1
left join A a2
on a2.groupID = a1.groupID
and a2.value <> a1.value
and a2.date < a1.date
group by a1.groupID
, a1.value
, a1.date
order by a1.date
produces this table:
+ ------- + ----- + ---------- + ------ +
| groupId | value | date | Ranked |
+ ------- + ----- + ---------- + ------ +
| 45 | a | 2000-01-01 | 0 |
| 45 | a | 2001-01-01 | 0 |
| 45 | b | 2002-01-01 | 2 |
| 45 | b | 2003-01-01 | 2 |
| 45 | a | 2004-01-01 | 2 |
| 45 | a | 2005-01-01 | 2 |
| 45 | b | 2006-01-01 | 4 |
| 45 | b | 2007-01-01 | 4 |
+ ------- + ----- + ---------- + ------ +
Then by grouping on groupId, value, and ranked, we can select the min(date). Since MySQL does not support CTE's, we'll just use a temporary table
create temporary table Ranking as (
select a1.groupID
, a1.value
, a1.date
, COUNT(a2.groupID) as Ranked
from A a1
left join A a2
on a2.groupID = a1.groupID
and a2.value <> a1.value
and a2.date < a1.date
group by a1.groupID
, a1.value
, a1.date
order by a1.date
)
select groupId
, value
, min(date) as date
from Ranking
group by groupId
, value
, ranked
order by date
and voila, we get the desired result
+ ------- + ----- + ---------- +
| groupId | value | date |
+ ------- + ----- + ---------- +
| 45 | a | 2000-01-01 |
| 45 | b | 2002-01-01 |
| 45 | a | 2004-01-01 |
| 45 | b | 2006-01-01 |
+ ------- + ----- + ---------- +