how associate unsubscribed users from multiple campaign send dates without having a job id which can join a specific send date to unscribe date

时间:2019-05-31 11:30:38

标签: sql presto amazon-athena

Using Athena (Presto), I need to calculate unsubscribe rate from some email campaigns we send out weekly. A campaign can have multiple send dates (can be send more than once) but there is no identifier in the table which I can use to associate a specific send to the actual unsubscribe.

e.g. email coming from same campaign_id is sent in dates '12-04-2019' and '17 - 04 - 2019'. I can see some users unsubscribing after the second communication has been sent out(from 17-4=2019 onward), but I cannot be certain that the unsubscribe is coming from the second email instead of the first one.

The only thing I can do is deciding that all the unsubscribe clicks(I have unsubscribe_date) after second email 'sent_date' have to be attributed to the second email.

I am struggling the write a sql code which does the work. I was thinking to use window function 'over partition by campaign_id, sent_date ' to sum unsubscribe users by campaign and sent_date. but it doesn't seem working. I am using Simba Ahtena JDBC.


SELECT 
campaign_sent,

date_sent,

COUNT(DISTINCT id_sent) AS tot_sent,

COUNT(DISTINCT (CASE WHEN date_sent<= Unsubscribe_Date THEN id_unsubscribe end )) OVER (PARTITION BY campaign_sent , date_sent ) AS unsubscribed_users_per_sent_date

FROM

(SELECT campaign_sent,

date_sent,

id as id_sent

        FROM email_send 

        GROUP BY 1,2,3

        )t

LEFT JOIN

(SELECT Unsubscribe_Date ,

 campaign_unsubscribe,

id as id_unsubscribe

          FROM email_unsubscribe

GROUP BY 1,
2,3

          ) z

  ON t.id_sent = z.id_unsubscribe

 AND t.campaign_sent= z.campaign_unsubscribe 

 GROUP BY 1,2

ORDER BY date_sent ASC

Not sure partition by is doing what I am expecting to do, plus I get this error:

[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 5:1: '"count"(DISTINCT (CASE WHEN ("date_sent" <= "Unsubscribe_Date") THEN "id_unsubscribe" END)) OVER (PARTITION BY "campaign_sent", "date_sent")' must be an aggregate expression or appear in GROUP BY clause [SQL State=HY000, DB Errorcode=100071]

If I group it I get of course the following another error:

[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 5:1: GROUP BY clause cannot contain aggregations or window functions: ["count"(DISTINCT (CASE WHEN ("date_sent" <= "Unsubscribe_Date") THEN "id_unsubscribe" END)) OVER (PARTITION BY "campaign_sent", "date_sent")] [SQL State=HY000, DB Errorcode=100071]

0 个答案:

没有答案