BigQuery嵌套挑战涉及联接和拥有(或Where)子句

时间:2015-01-09 08:19:16

标签: sql join nested google-bigquery having

我受到的挑战有点超出我的范围,所以我只是跳进去。

我在BigQuery中有一个示例数据集,您可以在此处找到用于测试目的:https://bigquery.cloud.google.com/table/robotic-charmer-726:bl_test_data.complex_problem

我需要弄清楚查询我的表的SQL代码并执行以下操作:

In simplest terms

通过使用以下规则进行汇总(我将从一个电子邮件地址开始,最后添加另一个):

作为前面的一般性说明,一切都要小写,以便聚合时Ben = ben。

电子邮件是最广泛的聚合,并由小写版本汇总。

将所有这些小写电子邮件的金额相加,如下图蓝色所示。

aggregate amounts by lowercase emails

接下来考虑名字和姓氏,并根据第一个和姓氏的小写字母总和选择它们。

first AND last name

注意,名字或名字不单独考虑。请参阅下文,Ben的总金额为160,而Kathleen的金额仅为150,但仍然选择Kathleen,因为她的全名金额高于任何其他全名。

First or last names NOT considered separately

接下来,根据最高金额选择SELECTED NAME的小写完整地址。

与名称类似,完整地址将所有列一起考虑。

Full address selection

现在我将添加另一个电子邮件地址,我们会做同样的事情。

enter image description here

每个小写的电子邮件地址都是单独考虑的。我现在意识到我应该用我的照片做得更清楚,但我不想再这样做了......太多的工作。所以我希望我已经说清楚了。

我希望你发现这是一个非常有趣的挑战!

1 个答案:

答案 0 :(得分:2)

这可能是更简洁的方法,但这将为您提供所需的答案:

    select email, first_name, last_name, address, city, state, zip, total_amount amount
from (
    select d.email email, d.first_name first_name, d.last_name last_name, d.amount amount, d.total_amount total_amount, e.address address, e.city city, e.state state, e.zip zip, row_number() over (partition by e.email order by e.amount desc) ord
    from (
        select a.email email, a.first_name first_name, a.last_name last_name, b.amount amount, c.amount total_amount
        from (
          SELECT  
            lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name, last_name)) as name_group, lower(address) address, lower(city) city, lower(state) state, lower(concat(address,city,state)) as location_group, zip, sum(amount) amount 
          FROM [robotic-charmer-726:bl_test_data.complex_problem]
          group by 1,2,3,4,5,6,7,8,9
        ) a
        inner join (
          select email, first_name, last_name, name_group, amount
          from (
            select email, first_name, last_name, name_group, amount, row_number() over (partition by email order by amount desc) as ord
            from (
              select lower(email) email , lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name,last_name)) as name_group, sum(amount) amount, 
              from [robotic-charmer-726:bl_test_data.complex_problem]
              group by 1, 2, 3, 4
            )
          )
          where ord = 1
        ) b
        on a.name_group = b.name_group
        inner join (
          select lower(email) email, sum(amount) amount
          from [robotic-charmer-726:bl_test_data.complex_problem]
          group by 1
        ) c
        on a.email = c.email
        group by 1,2,3,4,5
    ) d
    inner join (
        select lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(address) address, lower(city) city, lower(state) state, zip,lower(concat(lower(address),lower(city), lower(state), zip)) as location_group, sum(amount) amount
        from [robotic-charmer-726:bl_test_data.complex_problem]
        group by 1,2,3,4,5,6,7,8
    ) e
    on d.email = e.email and d.first_name = e.first_name and d.last_name = e.last_name
)
where ord = 1