避免在连接中使用冗余查询

时间:2014-04-17 13:38:39

标签: sql join sql-server-2008-r2

此问题的突出特点是尝试从一行中获取两列而不会返回同一行的第二个查询。我已经为上下文包含了更多信息。

我有以下(简化)表格,这些表格代表客户发送给我们的文件,由用户在这里分批扫描。

          Batches: Id, ...
        Documents: Id, CustomerId, ...
Documents_Batches: Id, BatchId, DocumentId

文档历史记录(创建,状态更改,编辑等):

   DocumentEvents: Id, DocumentId, UserId, Occurred (datetime)

我想要的是给定批次中的文档列表,以及一些事件数据:

           Result: DocumentId, CustomerId, Created, CreatedBy, ...

如何在同一行中创建创建日期和CreatedBy值?

ALTER PROCEDURE [dbo].[sp_GetBatchDocuments]
@BatchId INT
AS
BEGIN
    SELECT 
        Documents.Id, 
        Documents.CustomerId, 
        MIN(DocumentEvents.Occurred) AS Created,
        /* UserId value of the 'Created' row, AS CreatedBy */
        MAX(DocumentEvents.Occurred) AS Modified
        /* UserId value of the 'Modfied' row, AS ModifiedBy */
    FROM
        Documents 
        INNER JOIN Documents_Batches        
        ON Documents.Id = Documents_Batches.DocumentId
        INNER JOIN DocumentEvents
        ON Documents.Id = DocumentEvents.DocumentId
        WHERE Documents_Batches.BatchId = @BatchId;
END

虽然我可能事先得到它们,或者通过函数调用,但我能想到的每个案例都意味着同一行的多个查询。

编辑:除非SO有一些惊喜,我得出的结论是,如果没有对同一行的第二次查询(对于我想要的每个日期/用户列对),这在逻辑上是不可能的。为了实现这一点,SQL需要一个行值(与表值相对)的函数,在内部,需要首先按DocumentId过滤,然后按最低/最高日期过滤该结果。无论采用何种方法,都是两个问题。也许是时候重新评估这些数据的规范化策略了。

2 个答案:

答案 0 :(得分:3)

使用CTE和ROW_NUMBER函数可以执行类似

的操作
WITH MinMax AS (
SELECT d.Id
     , d.CustomerId
     , de.Occurred
     , de.UserId
     , RowAsc = ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
                                   ORDER BY de.Occurred)
     , RowDesc =ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
                                   ORDER BY de.Occurred Desc)
FROM   Documents d
       INNER JOIN Documents_Batches d_b ON d.Id = d_b.DocumentId
       INNER JOIN DocumentEvents de ON d.Id = d_e.DocumentId
WHERE  d_b.BatchId = @BatchId;
)
SELECT Id, CustomerId
     , Created = Max(Case When RowAsc = 1 Then Occurred Else Null End)
     , CreatedBy = Max(Case When RowAsc = 1 Then UserId Else Null End)
     , Modified = Max(Case When RowDesc = 1 Then Occurred Else Null End)
     , ModifiedBy = Max(Case When RowDesc = 1 Then UserId Else Null End)
FROM   MinMax
WHERE  1 IN (RowAsc, RowDesc)
GROUP BY Id, CustomerId

在MinMax中,RowAsc = 1的行是具有最小日期的行,而具有RowDesc = 1的行是具有Id,CustomerId组的最大日期的行

答案 1 :(得分:1)

我会这样做。两个连接不是多余的,它们选择不同的信息。

SELECT 
    Documents.Id, 
    Documents.CustomerId, 

    MinTable.Created,
    MinTable.UserId AS CreatedBy,

    MaxTable.Modified,
    MaxTable.UserId AS ModifiedBy
FROM
    Documents 
    INNER JOIN Documents_Batches        
    ON Documents.Id = Documents_Batches.DocumentId

    INNER JOIN (SELECT Occurred AS Created, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MIN(Occurred)) AS MinTable
    ON Documents.Id = MinTable.DocumentId

    INNER JOIN (SELECT Occurred AS Modified, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MAX(Occurred)) AS MaxTable
    ON Documents.Id = MaxTable.DocumentId

    WHERE Documents_Batches.BatchId = @BatchId;