我正在寻找想法和解决方案T-SQL来组合连续记录,如下例所示。
我正在使用的源数据库将拥有审计记录,以及一个名为" Audit_Type"的列。它可以包含许多不同的东西,例如" Saved Form" "导出记录","导入记录",或"观察记录"这个数据库最终会有一堆无关的记录,用于"保存的表格"类型,因为使用此数据库的应用程序自动保存表单,因为用户会定期对其进行编辑。所以经常会有一堆“已保存的表格”#34;记录连续。 想象一下:
ID Audit Type DateTime
1 "Viewed Record" 2017-01-03 11:16:33.000
2 "Saved Form" 2017-01-04 09:51:36.837
3 "Saved Form" 2017-01-04 09:52:40.837
4 "Saved Form" 2017-01-04 09:52:44.837
5 "Saved Form" 2017-01-04 09:52:49.837
6 "Saved Form" 2017-01-04 09:52:54.837
7 "Saved Form" 2017-01-04 09:54:59.837
8 "Exported Record" 2017-01-04 09:55:59.837
问题1。我想将这些连续的"保存的表格"记录到一条记录中,通过连续抓取"保存的表格"记录并将它们组合成一个记录,该记录使用最后一个"保存表格的时间戳"在将其加载到我的目标数据库之前。像这样的东西
ID Audit Type DateTime
1 "Viewed Record" 2017-01-03 11:16:33.000
7 "Saved Form" 2017-01-04 09:54:59.837
8 "Exported Record" 2017-01-04 09:55:59.837
到目前为止,我尝试了一些方法,但我希望听到一些想法。
问题2。根据我的研究和阅读关于SO的其他类似问题,我看到这可能类似于群岛和空白问题,这是否准确描述了这个问题?
EDIT 这适用于SQL Server 2012。 我从数据库中提取,我无法控制它如何记录信息。
另外要澄清的是,此日志表中还有其他列,为简洁起见,我省略了,因此在上面的示例中,我们可以假设所有"已保存的表单"记录来自同一会话和同一用户
答案 0 :(得分:1)
注意1:我还在我的查询中添加了一个假设的SessionID和UserID。其他列可能会更改您的最终结果。
注2: SQL Fiddle报告ROW_NUMBER版本运行速度更快,“同时”条目更少,但LEAD版本更快,许多“同时”条目。
MS SQL Server 2014架构设置:
CREATE TABLE foo ( ID int IDENTITY, sessionID int, userid int, AuditType varchar(50), [DateTime] datetime ) ;
INSERT INTO foo ( sessionID, userID, AuditType, [DateTime] )
VALUES
(1,1,'Viewed Record','2017-01-03 11:16:33.000')
, (1,1,'Saved Form','2017-01-04 09:51:36.837')
, (2,2,'Viewed Record','2017-01-04 09:52:00.000')
, (1,1,'Saved Form','2017-01-04 09:52:40.837')
, (1,1,'Saved Form','2017-01-04 09:52:44.837')
, (2,2,'Saved Form','2017-01-04 09:52:45.000')
, (2,2,'Saved Form','2017-01-04 09:52:46.000')
, (2,2,'Saved Form','2017-01-04 09:52:47.000')
, (2,2,'Saved Form','2017-01-04 09:52:48.000')
, (1,1,'Saved Form','2017-01-04 09:52:49.837')
, (1,1,'Saved Form','2017-01-04 09:52:54.837')
, (2,2,'Exported Record','2017-01-04 09:53:00.000')
, (1,1,'Saved Form','2017-01-04 09:54:59.837')
, (1,1,'Exported Record','2017-01-04 09:55:59.837')
, (2,1,'Viewed Record','2017-01-04 10:00:00.000')
, (2,1,'Saved Form','2017-01-04 10:02:00.000')
, (2,1,'Saved Form','2017-01-04 10:04:00.000')
, (2,1,'Saved Form','2017-01-04 10:06:00.000')
, (2,1,'Exported Record','2017-01-04 10:10:00.000')
;
查询1(LEAD()):
SELECT s1.sessionID
, s1.userID
, s1.AuditType
, s1.[DateTime]
FROM (
SELECT foo.*
, LEAD(foo.AuditType) OVER ( ORDER BY foo.userID, foo.sessionID, foo.[DateTime] ) AS next_type
FROM foo
) s1
WHERE s1.next_type IS NULL OR s1.next_type <> s1.AuditType
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]
<强> Results 强>:
| sessionID | userID | AuditType | DateTime |
|-----------|--------|-----------------|--------------------------|
| 1 | 1 | Viewed Record | 2017-01-03T11:16:33Z |
| 1 | 1 | Saved Form | 2017-01-04T09:54:59.837Z |
| 1 | 1 | Exported Record | 2017-01-04T09:55:59.837Z |
| 2 | 1 | Viewed Record | 2017-01-04T10:00:00Z |
| 2 | 1 | Saved Form | 2017-01-04T10:06:00Z |
| 2 | 1 | Exported Record | 2017-01-04T10:10:00Z |
| 2 | 2 | Viewed Record | 2017-01-04T09:52:00Z |
| 2 | 2 | Saved Form | 2017-01-04T09:52:48Z |
| 2 | 2 | Exported Record | 2017-01-04T09:53:00Z |
查询2(ROW_NUMBER()):
SELECT s1.*
FROM (
SELECT foo.*
, ROW_NUMBER() OVER ( PARTITION BY foo.userID, foo.sessionID, foo.AuditType ORDER BY foo.userID, foo.sessionID, foo.[DateTime] DESC ) AS rn
FROM foo
) s1
WHERE rn = 1
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]
<强> Results 强>:
| ID | sessionID | userid | AuditType | DateTime | rn |
|----|-----------|--------|-----------------|--------------------------|----|
| 1 | 1 | 1 | Viewed Record | 2017-01-03T11:16:33Z | 1 |
| 13 | 1 | 1 | Saved Form | 2017-01-04T09:54:59.837Z | 1 |
| 14 | 1 | 1 | Exported Record | 2017-01-04T09:55:59.837Z | 1 |
| 15 | 2 | 1 | Viewed Record | 2017-01-04T10:00:00Z | 1 |
| 18 | 2 | 1 | Saved Form | 2017-01-04T10:06:00Z | 1 |
| 19 | 2 | 1 | Exported Record | 2017-01-04T10:10:00Z | 1 |
| 3 | 2 | 2 | Viewed Record | 2017-01-04T09:52:00Z | 1 |
| 9 | 2 | 2 | Saved Form | 2017-01-04T09:52:48Z | 1 |
| 12 | 2 | 2 | Exported Record | 2017-01-04T09:53:00Z | 1 |
他们都应该表明:
1,1,'Viewed Record','2017-01-03 11:16:33.000'
1,1,'Saved Form','2017-01-04 09:54:59.837'
1,1,'Exported Record','2017-01-04 09:55:59.837'
2,1,'Viewed Record','2017-01-04 10:00:00.000'
2,1,'Saved Form','2017-01-04 10:06:00.000'
2,1,'Exported Record','2017-01-04 10:10:00.000'
2,2,'Viewed Record','2017-01-04 09:52:00.000'
2,2,'Saved Form','2017-01-04 09:52:48.000'
2,2,'Exported Record','2017-01-04 09:53:00.000'
答案 1 :(得分:0)
对于您的示例数据,您只需执行以下操作:
select type, max(id) as id, max(datetime) as datetime
from t
group by type;
如果你有交错类型,你只需要一个gap-and-islands解决方案。也就是说,同一类型出现在两个不同的组中。
在您的情况下,您只是在更改之前查找最后一条记录。你可以使用lead()
:
select t.*
from (select t.*,
lead(type) over (order by datetime) as next_type
from t
) t
where next_type is null or next_type <> type;
这比大多数间隙和岛屿问题更简单。
要处理会话/用户,您应在group by
的{{1}}或分区子句中包含相应的列。