通过SQL查询对逗号加入列进行分组

时间:2011-02-14 13:00:41

标签: sql sql-server sql-server-2005 tsql

我的表格结构如下所示,“邮件”列可以包含多个以逗号加入的电子邮件

数据(int)的

邮件(VARCHAR(200))

  

[数据] [邮件]


  

1 m1 @ gmail.com,m2 @ hotmail.com

     

2 m2 @ hotmail.com,m3 @ test.com

&安培;我需要生成如下所示的报告,计算每封电子邮件的每一行

  

[邮件] [计数]


  

m1@gmail.com 1

     

m2@hotmail.com 2

     

m3@test.com 1

那么生成如上所述的sql(server)查询是什么?我也无法改变表结构。

6 个答案:

答案 0 :(得分:3)

SQL Server解决方案

WITH T ([Data], [Mail])
     AS (SELECT 1,'m1@gmail.com,m2@hotmail.com' UNION ALL
         SELECT 2,'m2@hotmail.com,m3@test.com')
SELECT address  AS Mail,
       COUNT(*) AS [Count]
FROM   T
       CROSS APPLY (SELECT CAST('<m>' + REPLACE([Mail], ',', '</m><m>') + '</m>'
                                AS XML
                           ) AS x) ca1
       CROSS APPLY (SELECT T.split.value('.', 'varchar(200)') AS address
                    FROM   x.nodes('/m') T(split)) ca
GROUP  BY address  

答案 1 :(得分:2)

SQL Server 使用递归cte。

declare @Mail table (ID int, Mail varchar(200))

insert into @Mail values
(1, 'm1@gmail.com,m2@hotmail.com'),
(2, 'm2@hotmail.com,m3@test.com'),
(3, 'm2@hotmail.com')

;with cte1 as
(
  select Mail+',' as Mail
  from @Mail
),
cte2
as
(
  select
    left(Mail, charindex(',', Mail)-1) as Mail1,
    right(Mail, len(Mail)-charindex(',', Mail)) as Mail
  from cte1
  union all
  select
    left(Mail, charindex(',', Mail)-1) as Mail1,
    right(Mail, len(Mail)-charindex(',', Mail)) as Mail
  from cte2
  where charindex(',', Mail) > 1
)
select
  Mail1 as Mail,
  count(*) as [Count]
from cte2
group by Mail1

修改1 与以前相同,但处理Mail

中只有一封电子邮件的情况

答案 2 :(得分:1)

非常类似于Mikael的回答,并进行了微小的调整...... - 有一个带有'缓存'LEN的字段,以避免重复计算长度
- 通过将NULL CHARINDEX替换为NULL

,每次递归仅使用一个UNION

这些差异只会对长列表有明显的影响,因此会有多个级别的递归。


CROSS APPLY业务只是为了使SELECT更整洁,而不是重复NULLIF(CHARINDEX)次数。


WITH
  source (
    Data,
    Mail
  )
AS
(
  SELECT 1,'m1@gmail.com,m2@hotmail.com' UNION ALL
  SELECT 2,'m2@hotmail.com,m3@test.com'
)
,
  split_cte
AS
(
  SELECT
    LEFT (mail, ISNULL(comma - 1, LEN(mail)))     AS "current_mail",
    RIGHT(mail, ISNULL(LEN(mail) - comma, 0))     AS "mail_data",
    ISNULL(LEN(mail) - comma, 0)                  AS "chars"
  FROM
    source
  CROSS APPLY
    (SELECT NULLIF(CHARINDEX(',', mail), 0) AS "comma") AS search

  UNION ALL

  SELECT
    LEFT (mail_data, ISNULL(comma - 1, chars))    AS "current_mail",
    RIGHT(mail_data, ISNULL(chars - comma, 0))    AS "mail_data",
    ISNULL(chars - comma, 0)                      AS "chars"
  FROM
    split_cte
  CROSS APPLY
    (SELECT NULLIF(CHARINDEX(',', mail_data), 0) AS "comma") AS search
  WHERE
    chars > 0
)

SELECT
  current_mail     AS "Mail",
  COUNT(*)         AS "Count"
FROM
  split_cte
GROUP BY
  current_mail

答案 3 :(得分:1)

正确的做法是添加一个相关的表来存储多个电子邮件。将事物存储在逗号分隔列表中几乎总是一个糟糕的设计决定,就像您在尝试查询它时所发现的那样。这通常意味着您需要创建一个相关的表,因为您具有一对多的关系。如果您有正确相关的表,那么您想要完成的任务是微不足道的。

我不买我不能改变桌面结构作为借口。除非这是贵公司不拥有的商业产品,否则您可以更改结构,只需向管理层展示为什么有必要。您组织中的某个人可以更改数据库结构,找出谁并说服他为什么需要更改。如果它是商业数据库,请考虑在tble上创建一个触发器,以填充每次插入更新或删除电子邮件字段时创建的已编辑表。那么至少你只需要为每次记录更改而不是每次运行查询时都进行一次拆分过程。

答案 4 :(得分:1)

只使用没有XML或CTE的CHARINDEX,字符串拆分更快。

样本表

create table #tmp ([Data] int, [Mail] varchar(200))
insert #tmp SELECT 1,'m1@gmail.com,m2@hotmail.com,other, longer@test, fifth'
UNION ALL   SELECT 2,'m2@hotmail.com,m3@test.com'
UNION ALL   SELECT 3,'m3@single.com'
UNION ALL   SELECT 4,''
UNION ALL   SELECT 5,null

查询

select single, count(*) [Count]
from
(
    select ltrim(rtrim(substring(t.mail, v.number+1,
        isnull(nullif(charindex(',',t.mail,v.number+1),0)-v.number-1,200)))) single
    from #tmp t
    inner join master..spt_values v on v.type='p'
        and v.number <= len(t.Mail)
        and (substring(t.mail,v.number,1) = ',' or v.number=0)
) X
group by single

您提供的唯一部件是

  • #tmp :您的表名
  • #mail :列名称

答案 5 :(得分:0)

这是显示各种选项表现的补充答案:

使用一些数据填写样本表

create table tmp1 ([Data] int, [Mail] varchar(200))
insert tmp1 SELECT 1,'m1@gmail.com,m2@hotmail.com,other, longer@test, fifth'
UNION ALL   SELECT 2,'m2@hotmail.com,m3@test.com'
UNION ALL   SELECT 3,'m3@single.com'
UNION ALL   SELECT 4,''
UNION ALL   SELECT 5,null

insert tmp1
select data*10000 + number, mail
from tmp1, master..spt_values v
where v.type='P'

-- total rows: 10245

测试查询:

set statistics io on
set statistics time on

dbcc dropcleanbuffers dbcc freeproccache

select single, count(*) [Count]
from
(
    select ltrim(rtrim(substring(t.mail, v.number+1,
        isnull(nullif(charindex(',',t.mail,v.number+1),0)-v.number-1,200)))) single
    from tmp1 t
    inner join master..spt_values v on v.type='p'
        and v.number <= len(t.Mail)
        and (substring(t.mail,v.number,1) = ',' or v.number=0)
) X
group by single

dbcc dropcleanbuffers dbcc freeproccache

;with cte1 as
(
  select Mail+',' as Mail
  from tmp1
),
cte2
as
(
  select
    left(Mail, charindex(',', Mail)-1) as Mail1,
    right(Mail, len(Mail)-charindex(',', Mail)) as Mail
  from cte1
  union all
  select
    left(Mail, charindex(',', Mail)-1) as Mail1,
    right(Mail, len(Mail)-charindex(',', Mail)) as Mail
  from cte2
  where charindex(',', Mail) > 1
)
select
  Mail1 as Mail,
  count(*) as [Count]
from cte2
group by Mail1

dbcc dropcleanbuffers dbcc freeproccache

--SET ANSI_DEFAULTS ON
--SET ANSI_NULLS ON
;
SELECT address  AS Mail,
       COUNT(*) AS [Count]
FROM   tmp1
       CROSS APPLY (SELECT CAST('<m>' + REPLACE([Mail], ',', '</m><m>') + '</m>'
                                AS XML
                           ) AS x) ca1
       CROSS APPLY (SELECT T.split.value('.', 'varchar(200)') AS address
                    FROM   x.nodes('/m') T(split)) ca
GROUP  BY address  

统计

运行一些时间来感受平均值

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'spt_values'. Scan count 8196, logical reads 26637, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 3, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 641 ms,  elapsed time = 412 ms.

Table 'Worktable'. Scan count 2, logical reads 103271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 609 ms,  elapsed time = 614 ms.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 3, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2798 ms,  elapsed time = 1421 ms.

Table 'Worktable'. Scan count 2, logical reads 103334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tmp1'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 734 ms,  elapsed time = 742 ms.

摘要

第一个(CHARINDEX):CPU时间= 344毫秒,经过时间= 198毫秒。 秒(CTE):CPU时间= 594 ms,经过时间= 613 ms 第三个(XML):CPU时间= 2812毫秒,经过时间= 1418毫秒 第四(CTE2):CPU时间= 719 ms,经过时间= 750 ms。