我受到的挑战有点超出我的范围,所以我只是跳进去。
我在BigQuery中有一个示例数据集,您可以在此处找到用于测试目的:https://bigquery.cloud.google.com/table/robotic-charmer-726:bl_test_data.complex_problem
我需要弄清楚查询我的表的SQL代码并执行以下操作:
通过使用以下规则进行汇总(我将从一个电子邮件地址开始,最后添加另一个):
作为前面的一般性说明,一切都要小写,以便聚合时Ben = ben。
电子邮件是最广泛的聚合,并由小写版本汇总。
将所有这些小写电子邮件的金额相加,如下图蓝色所示。
接下来考虑名字和姓氏,并根据第一个和姓氏的小写字母总和选择它们。
注意,名字或名字不单独考虑。请参阅下文,Ben的总金额为160,而Kathleen的金额仅为150,但仍然选择Kathleen,因为她的全名金额高于任何其他全名。
接下来,根据最高金额选择SELECTED NAME的小写完整地址。
与名称类似,完整地址将所有列一起考虑。
现在我将添加另一个电子邮件地址,我们会做同样的事情。
每个小写的电子邮件地址都是单独考虑的。我现在意识到我应该用我的照片做得更清楚,但我不想再这样做了......太多的工作。所以我希望我已经说清楚了。
我希望你发现这是一个非常有趣的挑战!
答案 0 :(得分:2)
这可能是更简洁的方法,但这将为您提供所需的答案:
select email, first_name, last_name, address, city, state, zip, total_amount amount
from (
select d.email email, d.first_name first_name, d.last_name last_name, d.amount amount, d.total_amount total_amount, e.address address, e.city city, e.state state, e.zip zip, row_number() over (partition by e.email order by e.amount desc) ord
from (
select a.email email, a.first_name first_name, a.last_name last_name, b.amount amount, c.amount total_amount
from (
SELECT
lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name, last_name)) as name_group, lower(address) address, lower(city) city, lower(state) state, lower(concat(address,city,state)) as location_group, zip, sum(amount) amount
FROM [robotic-charmer-726:bl_test_data.complex_problem]
group by 1,2,3,4,5,6,7,8,9
) a
inner join (
select email, first_name, last_name, name_group, amount
from (
select email, first_name, last_name, name_group, amount, row_number() over (partition by email order by amount desc) as ord
from (
select lower(email) email , lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name,last_name)) as name_group, sum(amount) amount,
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1, 2, 3, 4
)
)
where ord = 1
) b
on a.name_group = b.name_group
inner join (
select lower(email) email, sum(amount) amount
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1
) c
on a.email = c.email
group by 1,2,3,4,5
) d
inner join (
select lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(address) address, lower(city) city, lower(state) state, zip,lower(concat(lower(address),lower(city), lower(state), zip)) as location_group, sum(amount) amount
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1,2,3,4,5,6,7,8
) e
on d.email = e.email and d.first_name = e.first_name and d.last_name = e.last_name
)
where ord = 1