左联接生成重复项

时间:2020-03-04 00:21:20

标签: sql postgresql group-by pivot

我目前正在尝试重整表格,以按订阅者级别汇总电子邮件指标。 这是我正在使用的表的样子:

 SELECT accountid,
       jobid,
       listid,
       batchid,
       subscriberkey,
       eventdate,
       eventtype,
       isunique,
       triggerersenddefinitionobjectid,
       triggeredsendcustomerkey,
       url,
       linkname,
       linkcontent,
       emailid,
       schedtime,
       pickuptime,
       deliveredtime,
       eventid,
       jobtype,
       jobstatus,
       emailname,
       emailsubject,
       sendtype,
       dynamicemailsubject,
       emailsenddefinition
FROM   email_metrics;  

我正在尝试重塑它,以便对于(subscriberkey + emailid)的每个唯一组合,我都有关于他们是否打开同一封电子邮件以及是否单击过该电子邮件的数据。

当前数据的外观示例(为了简化解释我的问题,我将表结构压缩为3列,很抱歉,不确定如何在此处插入表,因此可能会造成混淆):

记录示例1:

Subscriberkey | EmailID | Eventtype Open
1234          | 2       | Click
1234          | 2       |          

我希望从本质上将其重塑为每个唯一(SubscriberKey, EmailName)组合的一个记录:

SubscriberKey | EmailID2 | Is_Open | Is_Click 
1234          | 2        | True    | True

这将压缩与特定订户+电子邮件发送组合相关的所有数据,并在一条记录上向我显示相关指标。

我以前曾经能够成功完成此操作,但是我的笔记本电脑最近死了,很遗憾,我的脚本无法检索:(

到目前为止,我已经提出了以下建议,但是我发现从Left Joins生成的数据中存在重复项,在理解如何确保不会发生这种情况时遇到了一些麻烦数据:

WITH email_sent AS (
    SELECT *
    FROM email_metrics em 
    WHERE eventtype ='Sent'
),
    email_open AS (
    SELECT *
    FROM email_metrics em2 
    WHERE eventtype ='Open'
    AND isunique = True),

    email_click AS (
    SELECT * 
    FROM email_metrics em3 
    WHERE eventtype='Click'
    AND isunique = True
)

SELECT DISTINCT a.jobid, 
    a.subscriberkey,
    a.send_time,
    a.emailid,
    a.emailname,
    a.emailsubject,
    a.dynamicemailsubject,
    a.emailsenddefinition,
    a.is_opened,
    a.open_date,
    COALESCE (c.eventtype,'Not Clicked') AS is_click,
    c.eventdate AS click_date,
    c.url,
    c.linkname,
    c.linkcontent
FROM
(SELECT DISTINCT s.jobid,
    s.subscriberkey,
    (s.eventdate) AS send_time,
    s.emailid,
    s.emailname,
    s.emailsubject,
    s.dynamicemailsubject,
    s.emailsenddefinition,
    COALESCE (o.eventtype, 'Not Opened') AS is_opened,
    (o.eventdate) AS open_date
FROM email_sent s 
LEFT JOIN email_open o ON (s.jobid=o.jobid AND s.subscriberkey=o.subscriberkey)) a
LEFT JOIN email_click c ON (a.jobid=c.jobid AND a.subscriberkey=c.subscriberkey);

1 个答案:

答案 0 :(得分:1)

我建议仅对此使用条件聚合:

select
    subscriberkey,
    emailid,
    bool_or(eventtype = 'Open') Is_Open,
    bool_or(eventtype = 'Click') Is_Click
from email_metrics
group by subscriberkey, emailid