同一数据集的多个连接

时间:2014-10-06 10:48:49

标签: sql tsql select

我在T-SQL中编写一个复杂的查询。我拥有的数据集是一个表(让我们称之为Table T),数据看起来像

| Happened                | Contributor | Status | Direction | Purchased |
|-------------------------|-------------|--------|-----------|-----------|
| 2014-10-06 01:00:00.000 | A           | 0      | NULL      | NULL      |
| 2014-10-06 02:00:00.000 | A           | 1      | NULL      | NULL      |
| 2014-10-06 03:00:00.000 | A           | 2      | inbound   | NULL      |
| 2014-10-06 04:00:00.000 | A           | 0      | NULL      | yes       |
| 2014-10-06 05:00:00.000 | A           | 2      | outbound  | yes       |
| 2014-10-06 06:00:00.000 | B           | 2      | inbound   | NULL      |

所以我有

  • 一对唯一日期(T.Happened)和导致事件的主题(T.Contributor);
  • 事件发生时的状态;
  • 某个方向,可以是入站或出站(boolean-ish列);
  • 一些购买商标(与之前相同)。

我需要的是一个查询,对于每个不同的贡献者,选择

  • 状态从0更改为1时的最早日期时间;
  • 状态不为0且T.Direction = 'inbound';
  • 的最早日期时间
  • T.Purchased = 'yes'且状态不为0时的事件数。

并显示整组贡献者,即使该行中的其他字段为空。

我尝试的是多个连接,就像

一样
...
FROM Table T
JOIN Table T2 ON
    (T.Contributor = T2.Contributor
    AND T.Happened < T2.Happened
    AND T2.Status = 1
    AND T1.Status = 0)
JOIN Table T3 ON
...

之后的结果数据集应该看起来像

| Contributor | StatusChangedFrom0To1   | StatusWasNot0AndDirectionWasInbound | StatusWasNot0AndPuchasedWasYes |
|-------------|-------------------------|-------------------------------------|--------------------------------|
| A           | 2014-10-06 02:00:00.000 | 2014-10-06 03:00:00.000             | 2014-10-06 05:00:00.000        |
| B           | NULL                    | 2014-10-06 06:00:00.000             | NULL                           |

我应该遵循什么方法,我应该采用什么方向来获得所需的结果?我应该使用某种类型的连接(例如全外连接)吗?

我使用的是MS SQL Server 2008,并且我与这个版本绑定了,你知道,&#34;企业和东西&#34;原因,所以不太可能升级到任何新版本。

1 个答案:

答案 0 :(得分:1)

如果我假设状态仅增加,那么状态从0变为1的最早时间是状态为1的最早时间(对于您的问题中的样本数据,这是正确的):

select contributor,
       min(case when status = 1 then happened end) as StatusChangedFrom0To1,
       min(case when status <> 0 and direction = 'inbound' then happened end) as StatusWasNot0AndDirectionWasInbound,
       min(case when status <> 0 and purchased = 'yes' then happened end) as StatusWasNot0AndPuchasedWasYes,
       sum(case when status <> 0 and purchased = 'yes' then 1 else 0 end) as cnt
from table t
group by contributor;

如果StatusChangedFrom0To1的假设不成立,则查询仍然可以作为条件聚合,但是需要额外的工作才能获得此变量。在SQL Server 2012+中,您可以使用lag()来实现此目的:

select contributor,
       min(case when status = 1 and prevstatus = 0 then happened end) as StatusChangedFrom0To1,
       min(case when status <> 0 and direction = 'inbound' then happened end) as StatusWasNot0AndDirectionWasInbound,
       min(case when status <> 0 and purchased = 'yes' then happened end) as StatusWasNot0AndPuchasedWasYes,
       sum(case when status <> 0 and purchased = 'yes' then 1 else 0 end) as cnt
from (select t.*, lag(status) over (partition by Contributor order by happened) as prevstatus
      from table t
     ) t
group by contributor;

在早期版本中,我会使用相关子查询来获得等效功能。

编辑:

相关子查询如下所示:

from (select t.*,
             (select top 1 t2.status
              from table t2
              where t.contributor = t2.contributor and t2.happened < t.happened
              order by t2.happened desc
             ) as prevstatus
      from table t
     ) t