基于"类型"组合连续行。列

时间:2017-09-26 15:22:16

标签: sql sql-server tsql sql-server-2012 etl

我正在寻找想法和解决方案T-SQL来组合连续记录,如下例所示。

我正在使用的源数据库将拥有审计记录,以及一个名为" Audit_Type"的列。它可以包含许多不同的东西,例如" Saved Form" "导出记录","导入记录",或"观察记录"这个数据库最终会有一堆无关的记录,用于"保存的表格"类型,因为使用此数据库的应用程序自动保存表单,因为用户会定期对其进行编辑。所以经常会有一堆“已保存的表格”#34;记录连续。 想象一下:

ID     Audit Type        DateTime
1   "Viewed Record"   2017-01-03 11:16:33.000
2   "Saved Form"      2017-01-04 09:51:36.837
3   "Saved Form"      2017-01-04 09:52:40.837
4   "Saved Form"      2017-01-04 09:52:44.837
5   "Saved Form"      2017-01-04 09:52:49.837
6   "Saved Form"      2017-01-04 09:52:54.837
7   "Saved Form"      2017-01-04 09:54:59.837
8   "Exported Record" 2017-01-04 09:55:59.837

问题1。我想将这些连续的"保存的表格"记录到一条记录中,通过连续抓取"保存的表格"记录并将它们组合成一个记录,该记录使用最后一个"保存表格的时间戳"在将其加载到我的目标数据库之前。像这样的东西

ID     Audit Type        DateTime
1   "Viewed Record"   2017-01-03 11:16:33.000
7   "Saved Form"      2017-01-04 09:54:59.837
8   "Exported Record" 2017-01-04 09:55:59.837

到目前为止,我尝试了一些方法,但我希望听到一些想法。

问题2。根据我的研究和阅读关于SO的其他类似问题,我看到这可能类似于群岛和空白问题,这是否准确描述了这个问题?

EDIT 这适用于SQL Server 2012。 我从数据库中提取,我无法控制它如何记录信息。

另外要澄清的是,此日志表中还有其他列,为简洁起见,我省略了,因此在上面的示例中,我们可以假设所有"已保存的表单"记录来自同一会话和同一用户

2 个答案:

答案 0 :(得分:1)

戈登打败了我的答案。是的,这确实符合群岛和空白的方法。我认为LEAD()很适合这个问题。但我也尝试了一个带有ROW_NUMBER()的第二个查询,它产生了一个稍短的执行计划。不确定哪个会在规模上更有效率。这需要更多的测试。

注意1:我还在我的查询中添加了一个假设的SessionID和UserID。其他列可能会更改您的最终结果。

注2: SQL Fiddle报告ROW_NUMBER版本运行速度更快,“同时”条目更少,但LEAD版本更快,许多“同时”条目。

SQL Fiddle

MS SQL Server 2014架构设置

CREATE TABLE foo ( ID int IDENTITY, sessionID int, userid int, AuditType varchar(50), [DateTime] datetime ) ;
INSERT INTO foo ( sessionID, userID, AuditType, [DateTime] )
VALUES 
      (1,1,'Viewed Record','2017-01-03 11:16:33.000')
    , (1,1,'Saved Form','2017-01-04 09:51:36.837')
    , (2,2,'Viewed Record','2017-01-04 09:52:00.000')
    , (1,1,'Saved Form','2017-01-04 09:52:40.837')
    , (1,1,'Saved Form','2017-01-04 09:52:44.837')
    , (2,2,'Saved Form','2017-01-04 09:52:45.000')
    , (2,2,'Saved Form','2017-01-04 09:52:46.000')
    , (2,2,'Saved Form','2017-01-04 09:52:47.000')
    , (2,2,'Saved Form','2017-01-04 09:52:48.000')
    , (1,1,'Saved Form','2017-01-04 09:52:49.837')
    , (1,1,'Saved Form','2017-01-04 09:52:54.837')
    , (2,2,'Exported Record','2017-01-04 09:53:00.000')
    , (1,1,'Saved Form','2017-01-04 09:54:59.837')
    , (1,1,'Exported Record','2017-01-04 09:55:59.837')
    , (2,1,'Viewed Record','2017-01-04 10:00:00.000')
    , (2,1,'Saved Form','2017-01-04 10:02:00.000')
    , (2,1,'Saved Form','2017-01-04 10:04:00.000')
    , (2,1,'Saved Form','2017-01-04 10:06:00.000')
    , (2,1,'Exported Record','2017-01-04 10:10:00.000')
;

查询1(LEAD())

SELECT s1.sessionID
  , s1.userID
  , s1.AuditType
  , s1.[DateTime]
FROM (
    SELECT foo.*
      , LEAD(foo.AuditType) OVER ( ORDER BY foo.userID, foo.sessionID, foo.[DateTime] ) AS next_type
    FROM foo
  ) s1
WHERE s1.next_type IS NULL OR s1.next_type <> s1.AuditType
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]

<强> Results

| sessionID | userID |       AuditType |                 DateTime |
|-----------|--------|-----------------|--------------------------|
|         1 |      1 |   Viewed Record |     2017-01-03T11:16:33Z |
|         1 |      1 |      Saved Form | 2017-01-04T09:54:59.837Z |
|         1 |      1 | Exported Record | 2017-01-04T09:55:59.837Z |
|         2 |      1 |   Viewed Record |     2017-01-04T10:00:00Z |
|         2 |      1 |      Saved Form |     2017-01-04T10:06:00Z |
|         2 |      1 | Exported Record |     2017-01-04T10:10:00Z |
|         2 |      2 |   Viewed Record |     2017-01-04T09:52:00Z |
|         2 |      2 |      Saved Form |     2017-01-04T09:52:48Z |
|         2 |      2 | Exported Record |     2017-01-04T09:53:00Z |

查询2(ROW_NUMBER())

SELECT s1.*
FROM (
    SELECT foo.*
      , ROW_NUMBER() OVER ( PARTITION BY foo.userID, foo.sessionID, foo.AuditType ORDER BY foo.userID, foo.sessionID, foo.[DateTime] DESC ) AS rn
    FROM foo
  ) s1
WHERE rn = 1
ORDER BY s1.sessionID, s1.userID, s1.[DateTime]

<强> Results

| ID | sessionID | userid |       AuditType |                 DateTime | rn |
|----|-----------|--------|-----------------|--------------------------|----|
|  1 |         1 |      1 |   Viewed Record |     2017-01-03T11:16:33Z |  1 |
| 13 |         1 |      1 |      Saved Form | 2017-01-04T09:54:59.837Z |  1 |
| 14 |         1 |      1 | Exported Record | 2017-01-04T09:55:59.837Z |  1 |
| 15 |         2 |      1 |   Viewed Record |     2017-01-04T10:00:00Z |  1 |
| 18 |         2 |      1 |      Saved Form |     2017-01-04T10:06:00Z |  1 |
| 19 |         2 |      1 | Exported Record |     2017-01-04T10:10:00Z |  1 |
|  3 |         2 |      2 |   Viewed Record |     2017-01-04T09:52:00Z |  1 |
|  9 |         2 |      2 |      Saved Form |     2017-01-04T09:52:48Z |  1 |
| 12 |         2 |      2 | Exported Record |     2017-01-04T09:53:00Z |  1 |

他们都应该表明:

  1,1,'Viewed Record','2017-01-03 11:16:33.000'
  1,1,'Saved Form','2017-01-04 09:54:59.837'
  1,1,'Exported Record','2017-01-04 09:55:59.837'

  2,1,'Viewed Record','2017-01-04 10:00:00.000'
  2,1,'Saved Form','2017-01-04 10:06:00.000'
  2,1,'Exported Record','2017-01-04 10:10:00.000'

  2,2,'Viewed Record','2017-01-04 09:52:00.000'
  2,2,'Saved Form','2017-01-04 09:52:48.000'
  2,2,'Exported Record','2017-01-04 09:53:00.000'

答案 1 :(得分:0)

对于您的示例数据,您只需执行以下操作:

select type, max(id) as id, max(datetime) as datetime
from t
group by type;

如果你有交错类型,你只需要一个gap-and-islands解决方案。也就是说,同一类型出现在两个不同的组中。

在您的情况下,您只是在更改之前查找最后一条记录。你可以使用lead()

来实现
select t.*
from (select t.*,
             lead(type) over (order by datetime) as next_type
      from t
     ) t
where next_type is null or next_type <> type;

这比大多数间隙和岛屿问题更简单。

要处理会话/用户,您应在group by的{​​{1}}或分区子句中包含相应的列。