基于出现次数的sql case语句

时间:2014-11-03 21:37:18

标签: sql sql-server join case

我有一个表(t1),其中包含电子邮件地址,用户和域名:

     email                user         domain
joe123@domain.com        joe123        domain.com
sue234@email.net         sue234        email.net
      ...                  ...          ...

另一个表(t2)表示是否已打开发送到地址的电子邮件:

  Opened             Email
    0            joe123@domain.com
    1            sue234@email.net
    0            jack55@mybarber.com
   ...               ...

我想将t1.domain加入t2,但只能加入超过100x的域名。

我可以创建一个具有出现次数

的表
SELECT domain, count(domain) cntDomain
from table1
group by domain

结果如下:

   domain         cntDomain
 domain.com       5000
 email.net        4300
 mybarber.com     67

结果表如下所示:

  Opened             Email                 domain
    0            joe123@domain.com         domain.com
    1            sue234@email.net          email.net
    0            jack55@mybarber.com       other 
   ...               ...

但是无法弄清楚连接(我假设它将是一个左连接来为不常发生的值创建'其他'值)并且如果它发生则需要加入值的case语句超过100倍,如果不是其他'

的值

4 个答案:

答案 0 :(得分:0)

select *
from table2 t2
inner join
(
    SELECT domain, count(1) cntDomain
    from table1
    group by domain
    having count(1) > 100
) t1 on t2.email = t1.email

答案 1 :(得分:0)

目前还不清楚第一张表中的所有电子邮件是否都在第二张。如果是,你可以这样做:

select t1.*, t2.domain
from (select t2.*, count(*) over (partition by domain) as cnt
      from table2 t2
     ) t2 join
     table1 t1
     on t1.email = t2.email
where cnt > 100;

如果没有,我们可以在电子邮件地址中检查域名:

select t2.*, t1.domain
from table2 t2 left join
     (select t1.domain, count(*) as cnt
      from table1 t1
      group by t1.domain
     ) t1
     on t2.email like '%@' + t1.domain and
        cnt > 100;

预计此版本的性能真的非常糟糕。

答案 2 :(得分:0)

此方法使用内部查询来获取计数,然后使用case语句将计数解释为域或字符串'Other'。在一些游戏数据上进行测试以确保其有效,但我对其性能没有任何意见。

感觉有点尴尬,因为t1被查询两次;一旦获得域名,再次获得计数。无论如何,它完成了工作。

如果特定阈值发生变化,您可以将数字100换成另一个数字(或变量)。

select 
  t2.Opened
, t2.Email
, case when t3.cntDomain > 100 then t3.domain else 'Other' end as domain
from t2
left outer join t1 on t2.Email = t1.email
left outer join (
    select t1.domain, count(1) cntDomain
    from t1
    left outer join t2 on t1.email = t2.email
    group by t1.domain
) as t3 on t1.domain = t3.domain

修改

如果您不喜欢案例陈述,这种方法可能会更加优雅。使用having语句修改内部查询。现在,由于左连接,如果计数小于阈值,t3.domain将为空。在select语句中添加一点ISNULL以进行空合并,您就可以了。

select 
  t2.Opened
, t2.Email
, ISNULL(t3.domain, 'Other')
from t2
left outer join t1 on t2.Email = t1.email
left outer join (
    select t1.domain, count(1) cntDomain
    from t1
    left outer join t2 on t1.email = t2.email
    group by t1.domain
    having count(1) > 100
) as t3 on t1.domain = t3.domain

干杯!

答案 3 :(得分:0)

我认为以下查询应解决您的问题

       SELECT t2.opened,
       t2.Email,
       CASE WHEN tempt1.email is NULL THEN 'Other' ELSE tempt1.domain END as domain
       FROM t2 LEFT JOIN (SELECT email,domain
       FROM t1
       group by domain HAVING  count(domain)>100) tempt1 on t2.Email=tempt1.email