连续记录分组依据,计数和删除

时间:2013-09-17 13:00:15

标签: sql sql-server ranking-functions

一个稍微棘手的SQL问题(我们正在运行SQL Server 2008 R2)。

从日志表中我必须组合具有相同消息的连续记录,以计算组合消息的数量并删除组合消息。

为了使这更容易理解和可见,这里有一个小数据示例

ID  DATE       MSG  COUNT
1   2013-08-17 mail NULL
2   2013-08-17 mail NULL
3   2013-08-17 www  NULL
4   2013-08-18 www  NULL
5   2013-08-18 www  NULL
6   2013-08-18 www  NULL
7   2013-08-18 mail NULL
8   2013-08-18 www  NULL
9   2013-08-19 mail NULL
10  2013-08-19 mail NULL
11  2013-08-20 mail NULL
12  2013-08-20 mail NULL
13  2013-08-21 www  NULL
14  2013-08-22 mail NULL
15  2013-08-22 mail NULL
16  2013-08-23 mail NULL
17  2013-08-23 mail NULL
18  2013-08-23 mail NULL

结果应如下所示

ID  DATE       MSG  COUNT
1   2013-08-17 mail NULL
2   2013-08-17 mail NULL
3   2013-08-17 www  NULL
6   2013-08-18 www  3
7   2013-08-18 mail 1
8   2013-08-18 www  1
12  2013-08-20 mail 4
13  2013-08-21 www  1
15  2013-08-22 mail 2
16  2013-08-23 mail NULL
17  2013-08-23 mail NULL
18  2013-08-23 mail NULL

所以,基本上查询应该

  1. 仅在给定日期范围内处理数据(在此示例中为2013-08-18至2013-08-22)
  2. 根据字段MSG
  3. 的文本组合连续的行
  4. 计算合并数据并在字段Count
  5. 中设置值
  6. 删除组合记录(在此示例中,例如ID 6保留,但应删除ID 5和ID 4)
  7. 由于我不是SQL专家,我非常感谢任何帮助,建议或SQL查询

    提前致谢...

3 个答案:

答案 0 :(得分:1)

我的想法是用2个查询来做:

(i)第一个是仅计算和更新记录。

(ii)第二个是删除NULL列上COUNT值为COUNT的日期范围的所有记录。

编辑:我执行了步骤(i),但我无法保留NULLCOUNT以删除要删除的值。它使用DELETE更新所有行。现在你只需要UPDATE tab ta JOIN (SELECT date, msg, COUNT(*) AS cnt FROM tab GROUP BY date, msg) tb SET ta.count = tb.cnt WHERE ta.date = tb.date AND ta.msg = tb.msg AND ta.date BETWEEN DATE('2013-08-18') AND DATE('2013-08-21'); 正确的行。

步骤(i)

(适用于MySQL)

DATE

PS:我使用的UPDATE ta SET ta.count = tb.cnt FROM tab ta, (SELECT date, msg, COUNT(*) AS cnt FROM tab GROUP BY date, msg) tb WHERE ta.date = tb.date AND ta.msg = tb.msg AND ta.date BETWEEN CAST('2013-08-18' AS DATE) AND CAST('2013-08-20' AS DATE); 语法适用于MySQL,您可能会将其改编为MS SQL Server。

(对于MS SQL Server)

{{1}}

答案 1 :(得分:1)

试试这个:

DROP TABLE #temp 
GO
select
    * 
into #temp
from (
    select '1' as id,'2013-08-17' as [date], 'mail' as msg,'NULL' as [count] union all
    select '2','2013-08-17','mail','NULL' union all
    select '3','2013-08-17','www','NULL' union all
    select '4','2013-08-18','www','NULL' union all
    select '5','2013-08-18','www','NULL' union all
    select '6','2013-08-18','www','NULL' union all
    select '7','2013-08-18','mail','NULL' union all
    select '8','2013-08-18','www','NULL' union all
    select '9','2013-08-19','mail','NULL' union all
    select '10','2013-08-19','mail','NULL' union all
    select '11','2013-08-20','mail','NULL' union all
    select '12','2013-08-20','mail','NULL' union all
    select '13','2013-08-21','www','NULL' union all
    select '14','2013-08-22','mail','NULL' union all
    select '15','2013-08-22','mail','NULL' union all
    select '16','2013-08-23','mail','NULL' union all
    select '17','2013-08-23','mail','NULL' union all
    select '18','2013-08-23','mail','NULL'
) x
GO


select 
    t.*,
    rwn
from #temp t
join (
    select 
        id, [date], [msg], [rwn] = row_number() over(partition by [date], [msg] order by id )
    from #temp
    where 1=1
        and [date] between '2013-08-18' and '2013-08-22'
) x
    on t.id=x.id
 order by 
    t.date, t.msg

只需将其修改为UPDATE,然后删除rwn> 1

的所有行

编辑: 您的数据类型可能是文本,因此您可以对错误进行排序/比较。你真的需要文字吗?它是一种大型对象数据类型(blob),可以存储几GB的文本。尝试将此更改为varchar(8000),或者如果这些确实是那么大的消息,那么varchar(max)也会这样做

答案 2 :(得分:1)

嗨,请尝试这个希望它可以帮助你,我理解的方式是你需要分组并删除重复并保留1。抱歉我的英文

DECLARE @Table_2 TABLE (ID INT, [DATE] date, MSG Varchar(50), [COUNT] int)
Declare @fromDate as date = '2013-08-18'
Declare @toDate as date = '2013-08-22'

INSERT INTO @Table_2 (ID, [DATE], MSG, [COUNT])
SELECT     MAX(DISTINCT ID) AS ID, DATE, MSG, COUNT(DATE) AS COUNT
FROM         dbo.Table_1
where [DATE] between @fromDate and @toDate
GROUP BY DATE, MSG



UPDATE Table_1 
SET [COUNT] = T2.COUNT 

FROM Table_1 AS T1 INNER JOIN
@Table_2 AS T2
ON T1.ID = T2.ID

WHERE T1.ID = T2.ID


DELETE T1
FROM Table_1 AS T1
FULL OUTER JOIN @Table_2 AS T2 
ON T1.DATE = T2.DATE AND T1.MSG = T2.MSG 

WHERE (T1.DATE = T2.DATE AND T1.MSG = T2.MSG) AND T1.ID != T2.ID