在这里,我有一个网站访问者的样本表。我们可以看到,有时访客不提供他们的电子邮件。另外,他们可能会在一段时间内切换到其他电子邮件地址。
**
**
我要根据以下要求更新此表:
**
**
我想知道在Redshift或T-Sql中是否可以做到这一点?
谢谢大家!
答案 0 :(得分:0)
如果我们假设表的名称为Visits
,并且该表的主键由列Visitor_id
和Activity_Date
组成,那么您可以在T-SQL中执行以下操作:
update a
set a.Email = coalesce(
-- select the email used previously
(
select top 1 Email from Visits
where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date desc
),
-- if there was no email used previously then select the email used next
(
select top 1 Email from Visits
where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date
)
)
from Visits a
where a.Email is null;
update v
set Email = vv.Email
from Visits v
join (
select
v.Visitor_id,
coalesce(a.Email, b.Email) as Email,
v.Activity_Date,
row_number() over (partition by v.Visitor_id, v.Activity_Date
order by a.Activity_Date desc, b.Activity_Date) as Row_num
from Visits v
-- previous visits with email
left join Visits a
on a.Visitor_id = v.Visitor_id
and a.Email is not null
and a.Activity_Date < v.Activity_Date
-- next visits with email if there are no previous visits
left join Visits b
on b.Visitor_id = v.Visitor_id
and b.Email is not null
and b.Activity_Date > v.Activity_Date
and a.Visitor_id is null
where v.Email is null
) vv
on vv.Visitor_id = v.Visitor_id
and vv.Activity_Date = v.Activity_Date
where
vv.Row_num = 1;
答案 1 :(得分:0)
对于每个visitor_id,您可以使用之前的非空值更新空电子邮件值。如果没有,则使用下一个非空值。您可以按以下方式获取这些值:
select
v.*, v_prev.email prev_email, v_next.email next_email
from
visits v
left join visits v_prev on v.visitor_id = v_prev.visitor_id
and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
left join visits v_next on v.visitor_id = v_next.visitor_id
and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where
v.email is null
答案 2 :(得分:0)
在SQL Server或Redshift中,您可以使用子查询来计算电子邮件:
select t.*,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
)
from (select t.*,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
from t
) t;
然后您可以在更新中使用它:
更新t
设置emai = tt.imputed_email
从(选择t。
合并(电子邮件,
max(电子邮件)超过(按visitor_id,grp划分),
最大(当activity_date = first_email_date然后电子邮件结束时的情况)超过(按visitor_id划分)
)为imputed_email
从(选择t。,
分钟(电子邮件不为null时,activity_date结束的情况)超过
(按visitor_id顺序按activity_date进行分区)作为first_email_date,
以(grp)(按visitor_id的分区,按activity_date的顺序)计数(电子邮件)
从T
)吨
)tt
其中tt.visitor_id = t.visitor_id和tt.activity_date = t.activity_date以及
t.email为空;