仅返回联接表中具有最新日期的行

时间:2019-03-03 17:06:04

标签: sql-server tsql duplicates

通过运行以下查询,我意识到我在QueryExecutionId列上有重复项。

SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
    wfi.workflowdefinitionid AS FlowId,
    qe.publishing_date AS [Date],
    c.typename AS [Type],
    c.name As Name
INTO #Send
FROM
    [QueryExecutions] qe  
    JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
    LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
    LEFT JOIN [WorkflowInstances] wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;

例如当我使用这些QueryExecutionId中的一个进行测试时,我可以得到两个结果

select * from ##Send
where QueryExecutionId = 169237

我们意识到原因是这两行具有不同的FlowId(第一个查询中的第二个返回值)。在讨论了这个问题之后,我们决定以FlowId来记录最新的日期。该日期是名为lastexecutiontime的列,位于第三个联接表[WorkflowInstances]中,该表也是FlowId的来源表。

如何只获得QueryExecutionId时间的最新值的WorkflowInstances.lastexecution的唯一值,并删除重复项?

2 个答案:

答案 0 :(得分:1)

您可以使用派生表,其中first_valueworkflowinstanceidlastexecutiontime desc分区:

SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
    wfi.FlowId,
    qe.publishing_date AS [Date],
    c.typename AS [Type],
    c.name As Name
INTO #Send
FROM
    [QueryExecutions] qe  
    JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
    LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
    LEFT JOIN 
    (
        SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
        FROM [WorkflowInstances]
    ) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;

答案 1 :(得分:0)

请注意,您的不同查询与所选变量有关,
例如。数据1(QueryExecutionId = 169237,类型名=测试1)
数据2(QueryExecutionId = 169237,类型名=测试2)
以上2个数据被认为是不同的

尝试进行分区并选择[seq] = 1(下面的代码按日期进行分区)

SELECT *
    into #Send
    FROM
    (
           SELECT *,ROW_NUMBER() OVER (PARTITION BY [QueryExecutionid] ORDER BY [Date] DESC) [Seq]  
           FROM
           (
                  SELECT    qe.QueryExecutionid AS QueryExecutionId,
                            wfi.FlowId,
                            qe.publishing_date AS [Date], --should not have any null values
                            qe.[customer_idhash]
                            c.typename AS [Type],
                            c.name As Name

                  FROM [QueryExecutions] qe  
                  JOIN [Campaign] c 
                  ON qe.target_campaign_id = c.campaignid
                  LEFT JOIN [WorkflowInstanceCampaignActivities] wfica 
                  ON wfica.queryexecutionresultid = qe.executionresultid
                  LEFT JOIN 
                    (
                        SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
                        FROM [WorkflowInstances]
                    ) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid

           ) a
           WHERE [customer_idhash] IS NOT NULL
    ) b
    WHERE [Seq] = 1 
    ORDER BY [QueryExecutionid]