嵌套查询时对结果进行分组

时间:2016-04-07 14:41:43

标签: tsql grouping nested-queries

我试图在数据库中获得符合某些条件的6个月实体趋势,但问题是我需要深入嵌套几个级别以确定实体是否符合条件。

实体是“成员”,可能有多个“帐户”,我需要确保在我加入之前,他们的帐户都没有设置任何标记。

如果我想在特定日期获得一个计数(我们保留历史数据),我会做类似的事情:

SELECT COUNT(sup.SSN) 
FROM MemberSuppTable as sup 
WHERE  (
  sup.ProcessDate = @PROCESSDATE
  AND sup.MemberSuppID IN (
    SELECT summ.MemberSuppID 
    FROM MemberSummaryTable as summ
    WHERE  (
      summ.ProcessDate = @PROCESSDATE
      AND summ.AccountNumber IN (
        SELECT acct.AccountNumber 
        FROM AccountTable as acct
        WHERE ( 
          acct.ProcessDate = @PROCESSDATE
          --other criteria for account exclusion go here. 
        )
      )
    )
  )
)

MemberSuppTable有关于成员的高级别信息:

(ID, FirstAccountOpenDate, status, etc)

MemberSummaryTable将帐户与MemberSuppTable

中的成员联系起来
(AccountNumber, MemberSuppID, ...) 

现在,我正在尝试计算月末流程日期,按单个查询中的流程日期进行分组。

因此,上述查询将返回

ssn count
----------
1,000,000

我想:

process date | ssn count
------------------------
20160430     | 8,000,000
20160551     | 8,500,000
...          | ...
20160331     | 1,000,000

到目前为止,我已经提出了以下内容(请参阅下文,了解其无法解决的原因):

WITH valid_dates AS (
  SELECT D.ProcessDate 
  FROM arcu.vwARCUProcessDates AS D 
  WHERE d.FullDate = D.MonthEndDate 
    AND d.ProcessDate >= @SDATE
)


SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN) 
FROM MemberSuppTable as sup 
WHERE (
  AND sup.ProcessDate IN (SELECT * FROM valid_dates)    
  AND sup.MemberSuppID IN (
    SELECT summ.MemberSuppID
    FROM MemberSummaryTable as summ
    WHERE  (
      summ.ProcessDate IN (SELECT * FROM valid_dates)
      AND summ.AccountNumber IN (
        SELECT acct.AccountNumber 
        FROM AccountTable as acct
        WHERE ( 
          acct.ProcessDate IN (SELECT * FROM valid_dates)
          ...
        )
      )
    )
  )
)
GROUP BY (sup.ProcessDate)

但是,通过上述查询,我​​相信如果成员符合valid_dates表中任何进程日期的条件,那么它将包含在所有组中。

任何人都可以帮助我吗? (我是SQL的新手,所以请原谅我,如果我错过了一些简单的东西。)

2 个答案:

答案 0 :(得分:1)

IN子句对于这样的查询完全没问题。比连接更易读,因为您清楚地显示了从哪个表中选择数据以及仅访问哪些表以检查记录是否存在。这个结构很好,并且表明您已经对查询进行了一些思考。

但是,如果没有不必要的别名和括号,您的查询将更具可读性。

无论如何,你想使用你在子查询中找到的相同的进程日期,我猜,所以相应地增强你的IN子句:

select processdate, count(distinct ssn) 
from membersupptable 
where (processdate, membersuppid) in 
(
  select processdate, membersuppid
  from membersummarytable
  where (processdate, accountnumber) in
  (
    select processdate, accountnumber 
    from accounttable
    where processdate in 
    (
      select processdate 
      from vwarcuprocessdates
      where fulldate = monthenddate 
      and processdate >= @sdate
    )
  )
)
group by processdate;

答案 1 :(得分:1)

首先,我会使用INNER JOIN而不是WHERE .. IN重写您的第一个查询:

SELECT COUNT(DISTINCT sup.SSN) 
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
    ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate  = @PROCESSDATE
  AND summ.ProcessDate = @PROCESSDATE
  AND acct.ProcessDate = @PROCESSDATE
  -- other criteria for account exclusion go here.

这看起来更紧凑,(IMHO)更具可读性。

现在我会改变查询的方式,@PROCESSDATE只发生一次

SELECT COUNT(DISTINCT sup.SSN) 
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
    ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate  = @PROCESSDATE
  AND summ.ProcessDate = sup.ProcessDate
  AND acct.ProcessDate = sup.ProcessDate
  -- other criteria for account exclusion go here.

您可以将条件保留在WHERE子句中,但我更喜欢它们在ON子句中

SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable AS sup
INNER JOIN MemberSummaryTable AS summ 
    ON  summ.MemberSuppID = sup.MemberSuppID
    AND summ.ProcessDate  = sup.ProcessDate
INNER JOIN AccountTable AS acct
    ON  acct.AccountNumber = summ.AccountNumber
    AND acct.ProcessDate = sup.ProcessDate
WHERE sup.ProcessDate = @PROCESSDATE
  -- other criteria for account exclusion go here.

现在很容易获得每个ProcessDate的COUNT

SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON  summ.MemberSuppID = sup.MemberSuppID
    AND summ.ProcessDate  = sup.ProcessDate
INNER JOIN AccountTable AS acct
    ON  acct.AccountNumber = summ.AccountNumber
    AND acct.ProcessDate   = sup.ProcessDate
-- WHERE criteria for account exclusion go here. 
GROUP BY sup.ProcessDate

还要过滤" valid_dates"它只是一个额外的JOIN和一些WHERE条件

SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON  summ.MemberSuppID = sup.MemberSuppID
    AND summ.ProcessDate  = sup.ProcessDate
INNER JOIN AccountTable AS acct
    ON  acct.AccountNumber = summ.AccountNumber
    AND acct.ProcessDate   = sup.ProcessDate
INNER JOIN arcu.vwARCUProcessDates AS d
    ON d.ProcessDate = sup.ProcessDate
WHERE d.FullDate = d.MonthEndDate 
  AND d.ProcessDate >= @SDATE
  -- AND criteria for account exclusion go here.
GROUP BY sup.ProcessDate

为了获得更好的效果,GROUP BY d.ProcessDate可能会更好,但不要忘记调整SELECT部分。

修改 如评论中所述,如果每个SSN都要计算一次,则必须使用DISTINCT关键字。所以我编辑了解决方案。

还必须注意,即使使用DISTINCT,第一个查询也不会完全等同于原始查询。如果sup.SSN不唯一,则查询可能会返回不同的结果。