Using Athena (Presto), I need to calculate unsubscribe rate from some email campaigns we send out weekly. A campaign can have multiple send dates (can be send more than once) but there is no identifier in the table which I can use to associate a specific send to the actual unsubscribe.
e.g. email coming from same campaign_id is sent in dates '12-04-2019' and '17 - 04 - 2019'. I can see some users unsubscribing after the second communication has been sent out(from 17-4=2019 onward), but I cannot be certain that the unsubscribe is coming from the second email instead of the first one.
The only thing I can do is deciding that all the unsubscribe clicks(I have unsubscribe_date) after second email 'sent_date' have to be attributed to the second email.
I am struggling the write a sql code which does the work. I was thinking to use window function 'over partition by campaign_id, sent_date ' to sum unsubscribe users by campaign and sent_date. but it doesn't seem working. I am using Simba Ahtena JDBC.
SELECT
campaign_sent,
date_sent,
COUNT(DISTINCT id_sent) AS tot_sent,
COUNT(DISTINCT (CASE WHEN date_sent<= Unsubscribe_Date THEN id_unsubscribe end )) OVER (PARTITION BY campaign_sent , date_sent ) AS unsubscribed_users_per_sent_date
FROM
(SELECT campaign_sent,
date_sent,
id as id_sent
FROM email_send
GROUP BY 1,2,3
)t
LEFT JOIN
(SELECT Unsubscribe_Date ,
campaign_unsubscribe,
id as id_unsubscribe
FROM email_unsubscribe
GROUP BY 1,
2,3
) z
ON t.id_sent = z.id_unsubscribe
AND t.campaign_sent= z.campaign_unsubscribe
GROUP BY 1,2
ORDER BY date_sent ASC
Not sure partition by is doing what I am expecting to do, plus I get this error:
[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 5:1: '"count"(DISTINCT (CASE WHEN ("date_sent" <= "Unsubscribe_Date") THEN "id_unsubscribe" END)) OVER (PARTITION BY "campaign_sent", "date_sent")' must be an aggregate expression or appear in GROUP BY clause [SQL State=HY000, DB Errorcode=100071]
If I group it I get of course the following another error:
[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 5:1: GROUP BY clause cannot contain aggregations or window functions: ["count"(DISTINCT (CASE WHEN ("date_sent" <= "Unsubscribe_Date") THEN "id_unsubscribe" END)) OVER (PARTITION BY "campaign_sent", "date_sent")] [SQL State=HY000, DB Errorcode=100071]