SQL Server选择大多数列匹配的位置

时间:2011-07-12 23:19:31

标签: sql sql-server sql-server-2005

我有一个存储过程,可以传递1到4个变量,它必须返回大多数列匹配的行,或者如果没有匹配的记录,则返回默认值(为空)。 序列需要不同。

包含数据的示例表:

Client_Id Project_ID Phase Task Employee Sequence
--------- ---------- ----- ---- -------- --------
NULL      NULL       NULL  NULL Chris    1
NULL      NULL       NULL  NULL Bob      100
500       NULL       NULL  NULL Joe      1
500       2          NULL  NULL Max      1

因此,客户端100,任何项目,阶段或任务的结果将只是Chris和Bob的默认NULL记录。对于Client 500,结果将是Joe和Bob。对于Client 500,Project 2,结果将是Max和Bob。 现在我通过首先检查任务然后按阶段加入查询并检查没有行重叠并对项目然后客户端执行相同操作来执行此查询。看起来非常低效,必须有一个更聪明的方法。有什么想法吗?

编辑 - 一些查询示例,我首先检查所有内容匹配的情况

 insert into #TempTracking
    select  p.employee, p.sequence
        from        invoices i, projects p
        where   i.client_id = p.client_id
        and     i.project_no = p.project_no 
        and     i.phase = p.phase 
        and     i.task = p.task

然后我使查询越来越不具体,并检查序列是否已经存在。

  insert    into #TempTracking
select  p.employee, p.sequence
    from        invoices i, projects p
    where   (i.client_id = p.client_id or i.client_id is null)
    and     (i.project_no = p.project_no or i.project_no is null)
    and     (i.phase = p.phase or i.phase is null) 
    and     (i.task = p.task or i.task is null)
    and     NOT EXISTS ( SELECT * FROM #TempTracking t WHERE t.sequence = p.sequence )

1 个答案:

答案 0 :(得分:3)

“大多数列匹配”非常模糊,但我认为你的意思是,如果他们搜索null,或者表中的值为null,则假设可以包含此记录。

如果你想要最匹配的行或所有不匹配的行,那么你需要做这样的事情(它开始变得很长)

DECLARE @Client_Id VARCHAR(MAX) = '500'
DECLARE @Project_ID VARCHAR(MAX) = '2'
DECLARE @Phase VARCHAR(MAX) = NULL
DECLARE @Task VARCHAR(MAX) = NULL

SELECT Employee, Sequence 
FROM 
  (SELECT Employee, Sequence, 
  (
    CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
  ) AS MatchCount
WHERE MatchCount = 
  (
    SELECT MAX(
      CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
      CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
      CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
      CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
    )
    FROM myTable
  )
  -- Now prevent for duplicate sequence numbers
  AND NOT EXISTS (
    SELECT Employee, Sequence 
    FROM 
      (SELECT Employee, Sequence, 
      (
        CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
        CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
        CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
        CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
      ) AS MatchCount
      FROM myTable) mt2
    WHERE mt2.MatchCount = 
      (
        SELECT MAX(
          CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
          CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
          CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
          CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
        )
        FROM myTable
      )
      AND mt2.Sequence = myTable.Sequence AND mt2.MatchCount > myTable.MatchCount
  )

注意:当匹配字段数为零时,这将返回表中的所有记录。

我确信通过将所有匹配的行插入到临时表中并包含匹配的列数(MatchCount),可以清除它们的方式并不那么冗长,通过减少查询相当多。

现在,由于您需要返回唯一的序列和最高匹配的行/行,因此您要查找的结果更像是:

DECLARE @Client_Id VARCHAR(MAX) = '500'
DECLARE @Project_ID VARCHAR(MAX) = '3'
DECLARE @Phase VARCHAR(MAX) = NULL
DECLARE @Task VARCHAR(MAX) = NULL

INSERT INTO #myTempTable SELECT Employee, Sequence,
  (
    CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
  ) AS MatchCount,
   (
    CASE WHEN (Client_Id IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Project_ID IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Phase IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Task IS NULL) THEN 1 ELSE 0 END
  ) AS NullCount
--   ,(
--    CASE WHEN (Client_Id = @Client_Id OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
--    CASE WHEN (Project_ID = @Project_ID OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
--    CASE WHEN (Phase = @Phase OR @Phase IS NULL) THEN 1 ELSE 0 END + 
--    CASE WHEN (Task = @Task OR @Task IS NULL) THEN 1 ELSE 0 END
--  ) AS MatchCountWithoutNulls

SELECT Employee, Sequence
FROM #myTempTable mtt
WHERE MatchCount = (
    SELECT MAX(MatchCount) 
    FROM #myTempTable mtt2 
    WHERE mtt2.Sequence = mtt.Sequence
  )
  AND NullCount = (
    SELECT MIN(NullCount) 
    FROM #myTempTable mtt2 
    WHERE mtt2.Sequence = mtt.Sequence
  )

或者非常接近的东西,我没有一个由atm组成的测试表,所以我不能把它踢开来看看。