按值的接近度对查询记录进行分组

时间:2013-05-22 12:38:38

标签: sql-server sql-server-2008

我有一个审核日志表,记录针对业务对象的多个表的更改。日志带有时间戳,单个业务对象的更新很可能会在一段时间内发生。换句话说,保存合同可能需要5秒钟,而在此时间内添加或更新的记录将跨越该时间。从下表中,最后一列显示时间戳,值之间略有差异。

cm_contract 1087    2013-05-20 14:30:24.713
cm_contract 1087    2013-05-20 14:30:24.717
cm_contract 1087    2013-05-20 14:30:24.750
cm_contract 1087    2013-05-20 14:30:24.763
cm_contract 1087    2013-05-20 14:30:24.817
cm_contract 1087    2013-05-20 14:30:24.833
cm_contract 1087    2013-05-20 14:30:24.837
cm_contract 1087    2013-05-20 14:30:24.843
cm_contract 1087    2013-05-20 14:30:24.850
cm_contract 1087    2013-05-20 14:30:24.853

在查看器中,我想总结数据,显示已更改的业务对象以及针对该业务对象的日志数。为此,我需要按业务对象和密钥以及具有类似时间戳的记录对记录进行分组。我使用临时表和变量实现了这一点,但我最终想把它放到视图中,所以我想知道是否有更简单的方法来做到这一点:

SELECT ROW_NUMBER() OVER (ORDER BY business_object_table, business_object_key, mod_date) AS row_num, 
        audit_trail_key, business_object_table, business_object_key, mod_date, 0 AS part
INTO #temp
FROM audit_trail WHERE business_object_table is not null

DECLARE @part INT=0

UPDATE t2 SET @part = CASE WHEN ABS(DATEDIFF(millisecond, t2.mod_date, t1.mod_date)) < 1000 THEN @part ELSE @part + 1 END, part = @part
FROM #temp t2 INNER JOIN #temp t1 ON t2.row_num = t1.row_num +1
WHERE t2.business_object_table = t1.business_object_table 
AND t2.business_object_key = t1.business_object_key

SELECT * FROM #temp 

DROP TABLE #temp

我一直在寻找T-SQL中的某些东西

CLUSTER BY ABS(DATEDIFF(millisecond, t2.mod_date, t1.mod_date)) < 1000

但继续重定向到SQL Server故障转移群集,这不是我想要的。任何人都有任何想法

4 个答案:

答案 0 :(得分:1)

也许尝试这样的事情作为开始(“伪SQL”,未经测试):

select t1.myRowId, t1.contractId, min(t2.timestamp) - t1.timestamp as DeltaT
from myTable t1
inner join myTable t2 on t1.contractId = t2.contractId and t2.timestamp > t1.timestamp
group by t1.myRowId, t1.contractId
having min(t2.timestamp) - t1.timestamp > "60 seconds"

答案 1 :(得分:1)

SELECT CAST(CONVERT(datetime,mod_date)as float)

会给你一个日期的双重表示。

然后您可以将其除以并丢弃一些小数位以获得某种“类似的时间戳”。

但是,您似乎缺少业务逻辑的基本部分 - 某些标识符对参与事务的所有行都是通用的。如果没有这个,你猜测并且任何类型的多用户活动都会导致问题。我放弃了基于时间的方法,并在日志中传播某种事务标识符(*不是SQL事务 - 业务事务)。

答案 2 :(得分:1)

我建议你将时间戳转换为Smalldatetime,这样可以节省时间。它取代了你的Case语句。伯爵将完成剩下的工作。

SELECT business_object_table, business_object_key, cast (mod_date as smalldatetime) as mod_date, count (*) as No_of_Changes
FROM audit_trail 
WHERE business_object_table is not null
GROUP BY business_object_table, business_object_key, cast (mod_date as smalldatetime)
ORDER BY 3,1,2

希望这有帮助。

答案 3 :(得分:0)

我提出的解决方案类似于上面接受的答案,但通过使用两次连续更新之间的时间而不是一段时间内的所有更新,更好地支持不同的批量大小。我希望类似的解决方案也可用于聚类地理数据 - 我认为有人可能会觉得这很有趣。

总而言之,我使用公用表表达式为每个审计记录生成行号,按业务对象类型,键和记录日期排序。

然后我使用另一个cte从先前更新超过一秒的cte1中提取每条记录,同时为该结果集生成一个连续的行号。

然后我自行加入cte2以检索每个连续记录之间所有审计记录的摘要。示例代码如下:

-- Audit record row numbers by business object table, key and mod_date
WITH cte1 AS (SELECT ROW_NUMBER() OVER (ORDER BY business_object_table, business_object_key, mod_date) AS row_num, audit_trail_key,
        business_object_table, business_object_key, mod_date, user_key, business_object_name FROM audit_trail),
-- Get audit records where previous update was more than a second prior, and include the first audit record
cte2 AS (SELECT ROW_NUMBER() OVER (ORDER BY a2.row_num) AS row_num, a2.audit_trail_key,
        a2.business_object_table, a2.business_object_key, a2.mod_date, a2.user_key, a2.business_object_name
    FROM cte1 a2 LEFT JOIN cte1 a1 ON a2.row_num = a1.row_num + 1 AND a2.business_object_table = a1.business_object_table
        AND a2.business_object_key = a1.business_object_key
    WHERE ABS(DATEDIFF(ss, a1.mod_date, a2.mod_date)) > 1 OR a1.audit_trail_key IS NULL
    AND a2.business_object_table is not null)
-- Summarise details within each cluster    
SELECT a1.audit_trail_key, a1.business_object_table, a1.business_object_key, a1.mod_date AS first_mod, 
    u.username, a1.user_key, a1.business_object_name, 
    (SELECT audit_rec_table + ': ' + CAST(COUNT(*) AS VARCHAR) + ' record' +
        CASE WHEN COUNT(*) > 1 THEN 's' ELSE '' END + CASE mod_type WHEN 'U' THEN ' Changed' WHEN 'D' 
                THEN ' Deleted' ELSE ' Added' END + CHAR(10) FROM audit_trail
        WHERE business_object_table = a1.business_object_table AND business_object_key = a1.business_object_key 
                AND mod_date >= a1.mod_date AND mod_date < ISNULL(a2.mod_date, mod_date + 1)
        GROUP BY audit_rec_table, mod_type
        FOR XML PATH('')) AS change_summary
FROM cte2 a1 LEFT JOIN cte2 a2 ON a2.row_num = a1.row_num + 1 AND a2.business_object_table = a1.business_object_table
        AND a2.business_object_key = a1.business_object_key
    INNER JOIN su_user u ON u.user_key = a1.user_key