我想做的是获取包含29,268条记录的源数据,并从中创建六组不同的(通过电子邮件地址,这是数据中的字段)唯一的数据集。这是我的基本查询,可获取4,878条记录(从概念上讲,该查询将执行6次,但是我需要做的是每次能够通过电子邮件地址获得一组新的独特的4,878条记录(其中连续查询运行中的电子邮件地址在先前的运行中将不存在)。我在想什么,但我不确定如何继续进行自己需要做的事情。我将自己归为SQL的中级。这有点让我烦恼。有任何想法吗?
select top 1124 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%YAHOO.COM%'
union all
select top 402 * from
Master_Subscribers_Score_GTE_5
where ([E-mail Address] like '%HOTMAIL.COM%' or [E-mail Address] like '%LIVE.COM%')
union all
select top 45 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%AOL.COM%'
union all
select top 2353 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%GMAIL.COM%'
union all
select top 164 * from
Master_Subscribers_Score_GTE_5
where ([E-mail Address] like '%ATT.COM%' or [E-mail Address] like '%SBCGLOBAL.NET%')
union all
select top 8 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%COX.NET%'
union all
select top 3 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%VERIZON.NET%'
union all
select top 70 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%RR.COM%'
union all
select top 712 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] not like '%YAHOO.COM%' and
[E-mail Address] not like '%HOTMAIL.COM%' and
[E-mail Address] not like '%LIVE.COM%' and
[E-mail Address] not like '%AOL.COM%' and
[E-mail Address] not like '%GMAIL.COM%' and
[E-mail Address] not like '%ATT.COM%' and
[E-mail Address] not like '%SBCGLOBAL.NET%' and
[E-mail Address] not like '%COX.NET%' and
[E-mail Address] not like '%VERIZON.NET%' and
[E-mail Address] not like '%RR.COM%'
答案 0 :(得分:1)
首先,使用LIKE
有其缺点。看看this post。
您可以使用SUBSTRING
和CHARINDEX
来获取电子邮件地址提供商(主机)
以下将获取电子邮件提供商
SUBSTRING(Email, CHARINDEX('@', Email, 1)+1, LEN(EmailR) - CHARINDEX('@', Email, 1))
现在,由于您已经获得了需要过滤的部分,因此可以使用它来过滤记录,然后使用ROW_NUMBER()
获取每个提供程序的记录数,这些记录将再次用于进一步过滤。您可以使用CASE
完成记录。
这里是一个示例:
SELECT *
FROM (
SELECT *
, CASE
WHEN UPPER(EmailDomain) = 'YAHOO.COM' AND RN <= 1124
THEN 'Group 1'
WHEN UPPER(EmailDomain) = 'HOTMAIL.COM' AND RN <= 402
THEN 'Group 2'
WHEN UPPER(EmailDomain) = 'AOL.COM' AND RN <= 45
THEN 'Group 3'
WHEN UPPER(EmailDomain) = 'GMAIL.COM' AND RN <= 2353
THEN 'Group 4'
WHEN (UPPER(EmailDomain) = 'ATT.COM' OR UPPER(EmailDomain) = 'SBCGLOBAL.NET') AND RN < 164
THEN 'Group 5'
WHEN UPPER(EmailDomain) = 'COX.NET' AND RN <= 8
THEN 'Group 6'
WHEN UPPER(EmailDomain) = 'VERIZON.NET' AND RN <= 3
THEN 'Group 7'
WHEN UPPER(EmailDomain) = 'RR.COM' AND RN <= 70
THEN 'Group 8'
WHEN UPPER(EmailDomain) NOT IN('YAHOO.COM','HOTMAIL.COM','AOL.COM','GMAIL.COM','ATT.COM','SBCGLOBAL.NET','COX.NET','VERIZON.NET','RR.COM') AND RN <= 712
THEN 'Group 9'
ELSE NULL
END EmailGroup
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY EmailDomain ORDER BY EmailDomain) RN
FROM (
SELECT
Email
, SUBSTRING(Email, CHARINDEX('@', Email, 1)+1, LEN(EmailR) - CHARINDEX('@', Email, 1)) EmailDomain
FROM
Master_Subscribers_Score_GTE_5
) D
) C
) E
WHERE
EmailGroup IS NOT NULL
注意,我已经使用ROW_NUMBER()代替了SELECT TOP x
。然后,我只是给在任何条件下都不适合的记录提供了NULL,这为我提供了一种简单的方法来仅显示我需要的内容,并用剩余的NULL填充其余部分以排除结果。
我使用UPPER()
是因为我不知道您的数据库排序规则-是否区分大小写。所以我用它来克服这一点。如果您的数据库不区分大小写,则不需要它。
我希望这会有所帮助。
答案 1 :(得分:0)
with ranked as (
select m.*, n = row_number() over (partition by b.bucket order by m.[E-mail Address])
from Master_Subscribers_Score_GTE_5 m
outer apply (select bucket from (values
('yahoo.com'), ('hotmail.com,live.com'),
('aol.com'), ('gmail.com'), ('att.com,sbcglobal.net'),
('cox.net'), ('verizon.net'), ('rr.com'))
_(bucket) where exists (
select * from string_split(bucket, ',')
where m.[E-mail Address] like '%' + value + '%')) b)
select * from ranked where n % 6 = 0
..应该为您提供yahoo.com的1124,为hotmail.com和live.com的402,等等。然后查询n % 6 = 1
的下一组n % 6 = 2
的位置,依此类推。