Question

我在编写查询时遇到了麻烦，该查询将选择基于另一列（Col B）分组的最后一个“新”顺序不同的值（我们将此列称为Col A列）。由于这有点模棱两可/令人困惑，因此，这里有一个示例来说明（假设行号指示序列 inside 组；在我的问题中，行按日期排序）：

|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1      | A     | A     |
| 2      | B     | A     |
| 3      | C     | A     |
| 4      | B     | B     |
| 5      | A     | B     |
| 6      | B     | B     |

将选择：

| 3      | C     | A     |
| 6      | B     | B     |

请注意，尽管B也出现在第4行中，但第5行包含A的事实意味着第6行中的B是顺序不同的。但是如果表看起来像这样：

|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1      | A     | A     |
| 2      | B     | A     |
| 3      | C     | A     |
| 4      | B     | B     |
| 5      | A     | B     |
| 6      | A     | B     | <--

然后我们要选择：

| 3      | C     | A     |
| 5      | A     | B     |

我认为，如果我不关心值是不同的但不是顺序的值，这将是一个更容易解决的问题。我不太确定在查询时如何考虑顺序。

我试图通过计算Col A的每个值出现的最小/最大行号来解决此问题。该计算（使用第二个样本表）将产生如下结果：

|--------|--------|--------|--------|
| ColA   | ColB   | MinRow | MaxRow |
|--------|--------|--------|--------|
| A      | A      | 1      | 1      |
| B      | A      | 2      | 2      |
| C      | A      | 3      | 3      | 
| A      | B      | 5      | 6      |
| B      | B      | 4      | 4      |

在相关文章（SQL: Select Row with Last New Sequentially Distinct Value）中提出的解决方案采用了类似的方法，本质上采用了与上次ColA不同的最新RowNum，然后选择了下一行。但是，在这个问题中，我未能满足该查询适用于多个组的需求，因此没有新的帖子。

任何解决此问题的帮助，如果可以用SQL完成，将不胜感激。我正在运行SQL 2008 SP4。

Answer 1

嗯。。。一种方法是获取最后一个值。然后选择具有该值的所有最后一行并进行汇总：

select min(rownum), colA, colB
from (select t.*,
             first_value(colA) over (partition by colB order by rownum desc) as last_colA
      from t
     ) t
where rownum > all (select t2.rownum
                    from t t2
                    where t2.colB = t.colB and t2.colA <> t.last_colA
                   )
group by colA, colB;

或者，不进行汇总：

select t.*
from (select t.*,
             first_value(colA) over (partition by colB order by rownum desc) as last_colA,
             lag(colA) over (partition by colB order by rownum) as prev_clA
      from t
     ) t
where rownum > all (select t2.rownum
                    from t t2
                    where t2.colB = t.colB and t2.colA <> t.last_colA
                   ) and
      (prev_colA is null or prev_colA <> colA);

但是在SQL Server 2008中，让我们将其视为“孤岛问题”：

select t.*
from (select t.*,
             min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
             max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
      from (select t.*,
                   row_number() over (partition by colB order by rownum) as seqnum_b,
                   row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
                   max(rownum) over (partition by colB order by rownum) as max_rownum
            from t
           ) t
     ) t
where rownum = min_rownum_group and  -- first row in the group defined by adjacent colA, colB
      max_rownum_group = max_rownum  -- last group for each colB;

这使用不同的行号标识每个组。它计算该组和整个数据中的最大rownum。对于最后一组，这些是相同的。

SQL：使用分组选择最新的顺序不同的值

1 个答案: