我需要加入3个表,避免重复,并聚合来自2个表的数据

时间:2015-06-18 00:57:56

标签: sql-server join aggregate

我正在使用一些电子邮件数据并有3个文件:已发送,打开(已发送的子集)和点击(打开的子集)。基本上我想加入打开并点击按SubID发送文件(唯一标识符)。

此外,还有3个电子邮件部署(JobID);我想计算一个subID通过JobID打开电子邮件的次数以及他们通过JobID点击的链接数量。以下是一个例子:

JobID    SubID       Opened      Clicked?  #ofClicks
63809    44775286    0           0          0
89993    44775286    0           0          0
191443   44775286    0           0          0

63809    44775288    3           0          0
89993    44775288    1           0          0
191443   44775288    2           0          0

63809    44775490    4           1          3
89993    44775490    1           0          0
191443   44775490    1           0          0

基本上,如果SubID在打开的文件中,他们会打开电子邮件;如果SubID位于点击文件中,则会点击它。此示例中的前两列来自发送文件(尽管所有3个文件都包含这两列)。

我尝试使用查询来回答其中的一些问题(前4列),但它没有计算点击次数或打开得非常正确。它计算所有点击次数,并在所有jobID中打开,并为每个不是我想要的工作使用相同的总和。我确信有一种方法可以通过连接来实现这一点,但我仍然是SQL新手并且苦苦挣扎。

WITH temptable AS
(
SELECT Staging_SendLog.SubID ,jobid,
( SELECT COUNT(0) FROM Staging_DailyOpens 
  WHERE SubscriberID = Staging_SendLog.SubID) AS opens,
( SELECT COUNT(0) FROM Staging_DailyClicks 
  WHERE SubscriberID = Staging_SendLog.SubID) AS clicks
FROM Staging_SendLog 
WHERE  JobID = 63809 OR JobID = 89993
)
SELECT subid, jobid, opens, clicks FROM temptable
GROUP BY  subID, JobID, opens, clicks
ORDER BY 1;

有人可以帮忙吗?我正在使用Microsoft SQL Server

2 个答案:

答案 0 :(得分:0)

您正在进行不正确的分组:

WITH clicks as (
  SELECT SubID, COUNT(*) AS Clicks 
  FROM Staging_DailyClicks 
  GROUP BY SubID
), opens as (
  SELECT SubID, COUNT(*) AS Opens
  FROM Staging_DailyOpens 
  GROUP BY SubID
)
SELECT Staging_SendLog.SubID ,jobid,
   Opens.Opens,
   Clicks.clicks
FROM Staging_SendLog 
JOIN Opens  on Opens.SubID = Staging_SendLog.SubID
JOIN Clicks on Clicks.SubID = Staging_SendLog.SubID
WHERE Staging_SendLog.JobID = 63809
   OR Staging_SendLog.JobID = 89993
ORDER BY 1;

答案 1 :(得分:0)

如果您需要按JobID和SubID对计数进行分组,可以尝试

SELECT 
    s.JobID,
    s.SubID,
    c.ClickCount,
    o.OpenCount
FROM
    Staging_SendLog s
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS ClickCount
                FROM 
                    Staging_DailyClicks 
                WHERE 
                    JobID IN (63809,89993)
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) c ON c.JobID = s.JobID AND c.SubscriberId = s.SubID             
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS OpenCount
                FROM 
                    Staging_DailyOpens
                WHERE
                    JobID IN (63809,89993) 
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) o ON o.JobID = s.JobID AND o.SubscriberId = s.SubID  
WHERE  
    s.JobID IN (63809, 89993) 

如果您只需要SubID,则可以使用

SELECT 
    s.SubID,
    SUM(ISNULL(c.ClickCount,0)) AS ClickCount,
    SUM(ISNULL(o.OpenCount,0)) AS OpenCount
FROM
    Staging_SendLog s
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS ClickCount
                FROM 
                    Staging_DailyClicks 
                WHERE
                    JobID IN (63809,89993)
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) c ON c.JobID = s.JobID AND c.SubscriberId = s.SubID             
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS OpenCount
                FROM 
                    Staging_DailyOpens
                WHERE
                    JobID IN (63809,89993)
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) o ON o.JobID = s.JobID AND o.SubscriberId = s.SubID    
WHERE  
    s.JobID IN (63809,89993)   
GROUP BY 
    s.SubID

修改

我将JobID过滤器添加到子查询中,以防这些是大型表来过滤它们。应该有助于表现