我的表格结构如下所示,“邮件”列可以包含多个以逗号加入的电子邮件
数据(int)的
邮件(VARCHAR(200))
[数据] [邮件]
1 m1 @ gmail.com,m2 @ hotmail.com
2 m2 @ hotmail.com,m3 @ test.com
&安培;我需要生成如下所示的报告,计算每封电子邮件的每一行
[邮件] [计数]
m1@gmail.com 1
m2@hotmail.com 2
m3@test.com 1
那么生成如上所述的sql(server)查询是什么?我也无法改变表结构。
答案 0 :(得分:3)
SQL Server解决方案
WITH T ([Data], [Mail])
AS (SELECT 1,'m1@gmail.com,m2@hotmail.com' UNION ALL
SELECT 2,'m2@hotmail.com,m3@test.com')
SELECT address AS Mail,
COUNT(*) AS [Count]
FROM T
CROSS APPLY (SELECT CAST('<m>' + REPLACE([Mail], ',', '</m><m>') + '</m>'
AS XML
) AS x) ca1
CROSS APPLY (SELECT T.split.value('.', 'varchar(200)') AS address
FROM x.nodes('/m') T(split)) ca
GROUP BY address
答案 1 :(得分:2)
SQL Server 使用递归cte。
declare @Mail table (ID int, Mail varchar(200))
insert into @Mail values
(1, 'm1@gmail.com,m2@hotmail.com'),
(2, 'm2@hotmail.com,m3@test.com'),
(3, 'm2@hotmail.com')
;with cte1 as
(
select Mail+',' as Mail
from @Mail
),
cte2
as
(
select
left(Mail, charindex(',', Mail)-1) as Mail1,
right(Mail, len(Mail)-charindex(',', Mail)) as Mail
from cte1
union all
select
left(Mail, charindex(',', Mail)-1) as Mail1,
right(Mail, len(Mail)-charindex(',', Mail)) as Mail
from cte2
where charindex(',', Mail) > 1
)
select
Mail1 as Mail,
count(*) as [Count]
from cte2
group by Mail1
修改1
与以前相同,但处理Mail
答案 2 :(得分:1)
非常类似于Mikael的回答,并进行了微小的调整......
- 有一个带有'缓存'LEN的字段,以避免重复计算长度
- 通过将NULL CHARINDEX替换为NULL
这些差异只会对长列表有明显的影响,因此会有多个级别的递归。
CROSS APPLY业务只是为了使SELECT更整洁,而不是重复NULLIF(CHARINDEX)次数。
WITH
source (
Data,
Mail
)
AS
(
SELECT 1,'m1@gmail.com,m2@hotmail.com' UNION ALL
SELECT 2,'m2@hotmail.com,m3@test.com'
)
,
split_cte
AS
(
SELECT
LEFT (mail, ISNULL(comma - 1, LEN(mail))) AS "current_mail",
RIGHT(mail, ISNULL(LEN(mail) - comma, 0)) AS "mail_data",
ISNULL(LEN(mail) - comma, 0) AS "chars"
FROM
source
CROSS APPLY
(SELECT NULLIF(CHARINDEX(',', mail), 0) AS "comma") AS search
UNION ALL
SELECT
LEFT (mail_data, ISNULL(comma - 1, chars)) AS "current_mail",
RIGHT(mail_data, ISNULL(chars - comma, 0)) AS "mail_data",
ISNULL(chars - comma, 0) AS "chars"
FROM
split_cte
CROSS APPLY
(SELECT NULLIF(CHARINDEX(',', mail_data), 0) AS "comma") AS search
WHERE
chars > 0
)
SELECT
current_mail AS "Mail",
COUNT(*) AS "Count"
FROM
split_cte
GROUP BY
current_mail
答案 3 :(得分:1)
正确的做法是添加一个相关的表来存储多个电子邮件。将事物存储在逗号分隔列表中几乎总是一个糟糕的设计决定,就像您在尝试查询它时所发现的那样。这通常意味着您需要创建一个相关的表,因为您具有一对多的关系。如果您有正确相关的表,那么您想要完成的任务是微不足道的。
我不买我不能改变桌面结构作为借口。除非这是贵公司不拥有的商业产品,否则您可以更改结构,只需向管理层展示为什么有必要。您组织中的某个人可以更改数据库结构,找出谁并说服他为什么需要更改。如果它是商业数据库,请考虑在tble上创建一个触发器,以填充每次插入更新或删除电子邮件字段时创建的已编辑表。那么至少你只需要为每次记录更改而不是每次运行查询时都进行一次拆分过程。
答案 4 :(得分:1)
只使用没有XML或CTE的CHARINDEX,字符串拆分更快。
样本表
create table #tmp ([Data] int, [Mail] varchar(200))
insert #tmp SELECT 1,'m1@gmail.com,m2@hotmail.com,other, longer@test, fifth'
UNION ALL SELECT 2,'m2@hotmail.com,m3@test.com'
UNION ALL SELECT 3,'m3@single.com'
UNION ALL SELECT 4,''
UNION ALL SELECT 5,null
查询
select single, count(*) [Count]
from
(
select ltrim(rtrim(substring(t.mail, v.number+1,
isnull(nullif(charindex(',',t.mail,v.number+1),0)-v.number-1,200)))) single
from #tmp t
inner join master..spt_values v on v.type='p'
and v.number <= len(t.Mail)
and (substring(t.mail,v.number,1) = ',' or v.number=0)
) X
group by single
您提供的唯一部件是
答案 5 :(得分:0)
这是显示各种选项表现的补充答案:
使用一些数据填写样本表
create table tmp1 ([Data] int, [Mail] varchar(200))
insert tmp1 SELECT 1,'m1@gmail.com,m2@hotmail.com,other, longer@test, fifth'
UNION ALL SELECT 2,'m2@hotmail.com,m3@test.com'
UNION ALL SELECT 3,'m3@single.com'
UNION ALL SELECT 4,''
UNION ALL SELECT 5,null
insert tmp1
select data*10000 + number, mail
from tmp1, master..spt_values v
where v.type='P'
-- total rows: 10245
测试查询:
set statistics io on
set statistics time on
dbcc dropcleanbuffers dbcc freeproccache
select single, count(*) [Count]
from
(
select ltrim(rtrim(substring(t.mail, v.number+1,
isnull(nullif(charindex(',',t.mail,v.number+1),0)-v.number-1,200)))) single
from tmp1 t
inner join master..spt_values v on v.type='p'
and v.number <= len(t.Mail)
and (substring(t.mail,v.number,1) = ',' or v.number=0)
) X
group by single
dbcc dropcleanbuffers dbcc freeproccache
;with cte1 as
(
select Mail+',' as Mail
from tmp1
),
cte2
as
(
select
left(Mail, charindex(',', Mail)-1) as Mail1,
right(Mail, len(Mail)-charindex(',', Mail)) as Mail
from cte1
union all
select
left(Mail, charindex(',', Mail)-1) as Mail1,
right(Mail, len(Mail)-charindex(',', Mail)) as Mail
from cte2
where charindex(',', Mail) > 1
)
select
Mail1 as Mail,
count(*) as [Count]
from cte2
group by Mail1
dbcc dropcleanbuffers dbcc freeproccache
--SET ANSI_DEFAULTS ON
--SET ANSI_NULLS ON
;
SELECT address AS Mail,
COUNT(*) AS [Count]
FROM tmp1
CROSS APPLY (SELECT CAST('<m>' + REPLACE([Mail], ',', '</m><m>') + '</m>'
AS XML
) AS x) ca1
CROSS APPLY (SELECT T.split.value('.', 'varchar(200)') AS address
FROM x.nodes('/m') T(split)) ca
GROUP BY address
运行一些时间来感受平均值
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'spt_values'. Scan count 8196, logical reads 26637, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 3, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 641 ms, elapsed time = 412 ms.
Table 'Worktable'. Scan count 2, logical reads 103271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 609 ms, elapsed time = 614 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 3, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2798 ms, elapsed time = 1421 ms.
Table 'Worktable'. Scan count 2, logical reads 103334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 734 ms, elapsed time = 742 ms.
第一个(CHARINDEX):CPU时间= 344毫秒,经过时间= 198毫秒。
秒(CTE):CPU时间= 594 ms,经过时间= 613 ms
第三个(XML):CPU时间= 2812毫秒,经过时间= 1418毫秒
第四(CTE2):CPU时间= 719 ms,经过时间= 750 ms。